Quantcast
Channel: Grant Winney
Viewing all 348 articles
Browse latest View live
↧

SO Vault: What is your most useful SQL trick to avoid writing more code?

$
0
0
SO Vault: What is your most useful SQL trick to avoid writing more code?

StackOverflow sees quite a few threads deleted, usually for good reasons. Among the stinkers, though, lies the occasionally useful or otherwise interesting one, deleted by some pedantic nitpicker - so I resurrect them. πŸ‘»

Note: Because these threads are older, info may be outdated and links may be dead. Feel free to contact me, but I may not update them... this is an archive after all.


What is your most useful SQL trick to avoid writing more code?

Question asked by EvilTeach

I am intending this to be an entry which is a resource for anyone to find out about aspects of SQL that they may have not run into yet, so that the ideas can be stolen and used in their own programming. With that in mind...

What SQL tricks have you personally used, that made it possible for you to do less actual real world programming to get things done?

[EDIT]

A fruitful area of discussion would be specific techniques that allow you to do operations on the database side, that make it unnecessary to pull the data back to the program, then update/insert it back to the database.

[EDIT]

I recommend that you flesh out your answer where possible to make it easy for the reader to understand the value that your technique provides. Visual examples work wonders. The winning answer will have good examples.

My thanks to everyone who shared an idea with the rest of us.

Comments

Another way to phrase this question is "what good programming practices have you disregarded to spill logic between concerns and make miserable the poor chap who has to come after you and try to make changes?" Not that pragmatism is a bad thing :) – Rex M Jan 28 '09 at 19:18


Answer by Rad

This statement can save you hours and hours of programming

insert into ... select ... from

For example:
INSERT INTO CurrentEmployee SELECT * FROM Employee WHERE FireDate IS NULL; will populate your new table with existing data. It avoids the need to do an ETL operation or use multiple insert statements to load your data.


Answer by EvilTeach

I think the most useful one that I have used, is the WITH statement.

It allows subquery reuse, which makes it possible to write with a single query invocation, what normally would be two or more invocations, and the use of a temporary table.

The with statement will create inline views, or use a temporary table as needed in Oracle.

Here is a silly example

WITH 
mnssnInfo AS
(
    SELECT SSN, 
           UPPER(LAST_NAME), 
           UPPER(FIRST_NAME), 
           TAXABLE_INCOME,          
           CHARITABLE_DONATIONS
    FROM IRS_MASTER_FILE
    WHERE STATE = 'MN'                 AND -- limit to Minne-so-tah
          TAXABLE_INCOME > 250000      AND -- is rich 
          CHARITABLE_DONATIONS > 5000      -- might donate too
),
doltishApplicants AS
(
    SELECT SSN, SAT_SCORE, SUBMISSION_DATE
    FROM COLLEGE_ADMISSIONS
    WHERE SAT_SCORE < 100          -- Not as smart as some others.
),
todaysAdmissions AS
(
    SELECT doltishApplicants.SSN, 
           TRUNC(SUBMISSION_DATE)  SUBMIT_DATE, 
           LAST_NAME, FIRST_NAME, 
           TAXABLE_INCOME
    FROM mnssnInfo,
         doltishApplicants
    WHERE mnssnInfo.SSN = doltishApplicants.SSN
)
SELECT 'Dear ' || FIRST_NAME || 
       ' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE)    -- For stuff received today only

One of the other things I like about it, is that this form allows you to separate the filtering from the joining. As a result, you can frequently copy out the subqueries, and execute them stand alone to view the result set associated with them.


Answer by GBa (Jan 28, 2009)

Writing "where 1=1...." that way you don't have to keep track of where to put an AND into the statement you're generating.


Answer by Eric Johnson

Copy a table without copying the data:

select * into new_table from old_table where 1=0

OR

SELECT TOP(0) * INTO NEW_TABLE FROM OLD_TABLE

Answer by Mark Harrison

My old office-mate was an extreme SQL enthusiast. So whenever I would complain "Oh dear, this SQL stuff is so hard, I don't think there's any way to solve this in SQL, I'd better just loop over the data in C++, blah blah," he would jump in and do it for me.


Answer by geofftnz

Use Excel to generate SQL. This is especially useful when someone emails you a spreadsheet full of rubbish with a request to "update the system" with their modifications.

  A       B       C
1 BlahID  Value   SQL Generation
2 176     12.76   ="UPDATE Blah SET somecolumn=" & B2 & " WHERE BlahID=" & A2
3 177     10.11   ="UPDATE Blah SET somecolumn=" & B3 & " WHERE BlahID=" & A3
4 178      9.57   ="UPDATE Blah SET somecolumn=" & B4 & " WHERE BlahID=" & A4

You do need to be careful though because people will have a column for something like UnitPrice and have 999 valid entries and one containing "3 bucks 99 cents".

Also "I have highlighted set A in yellow and set B in green. Put the green ones in the database." grrr.

EDIT: Here's what I actually use for Excel->SQL. I've got a couple of VBA functions that sit in an XLA file that's loaded by Excel on startup. Apologies for any bugs - it's a quick dirty hack that's nonetheless saved me a bucketload of time over the past few years.

Public Function SQL_Insert(tablename As String, columnheader As Range, columntypes As Range, datarow As Range) As String

    Dim sSQL As String
    Dim scan As Range
    Dim i As Integer
    Dim t As String
    Dim v As Variant

    sSQL = "insert into " & tablename & "("

    i = 0

    For Each scan In columnheader.Cells
        If i > 0 Then sSQL = sSQL & ","
        sSQL = sSQL & scan.Value
        i = i + 1
    Next

    sSQL = sSQL & ") values("

    For i = 1 To datarow.Columns.Count

        If i > 1 Then sSQL = sSQL & ","

        If LCase(datarow.Cells(1, i).Value) = "null" Then

            sSQL = sSQL & "null"

        Else

            t = Left(columntypes.Cells(1, i).Value, 1)

            Select Case t
                Case "n": sSQL = sSQL & datarow.Cells(1, i).Value
                Case "t": sSQL = sSQL & "'" & Replace(datarow.Cells(1, i).Value, "'", "''") & "'"
                Case "d": sSQL = sSQL & "'" & Excel.WorksheetFunction.Text(datarow.Cells(1, i).Value, "dd-mmm-yyyy") & "'"
                Case "x": sSQL = sSQL & datarow.Cells(1, i).Value
            End Select
        End If
    Next

    sSQL = sSQL & ")"

    SQL_Insert = sSQL

End Function

Public Function SQL_CreateTable(tablename As String, columnname As Range, columntypes As Range) As String

    Dim sSQL As String

    sSQL = "create table " & tablename & "("

    Dim scan As Range
    Dim i As Integer
    Dim t As String

    For i = 1 To columnname.Columns.Count

        If i > 1 Then sSQL = sSQL & ","

        t = columntypes.Cells(1, i).Value
        sSQL = sSQL & columnname.Cells(1, i).Value & " " & Right(t, Len(t) - 2)

    Next

    sSQL = sSQL & ")"

    SQL_CreateTable = sSQL

End Function

The way to use them is to add an extra row to your spreadsheet to specify column types. The format of this row is "x sqltype" where x is the type of data (t = text, n = numeric, d = datetime) and sqltype is the type of the column for the CREATE TABLE call. When using the functions in forumulas, put dollar signs before the row references to lock them so they dont change when doing a fill-down.

eg:

Name           DateOfBirth  PiesPerDay    SQL
t varchar(50)  d datetime   n int         =SQL_CreateTable("#tmpPies",A1:C1,A2:C2)
Dave           15/08/1979   3             =sql_insert("#tmpPies",A$1:C$1,A$2:C$2,A3:C3)
Bob            9/03/1981    4             =sql_insert("#tmpPies",A$1:C$1,A$2:C$2,A4:C4)
Lisa           16/09/1986   1             =sql_insert("#tmpPies",A$1:C$1,A$2:C$2,A5:C5)

Which gives you:

create table #tmpPies(Name varchar(50),DateOfBirth datetime,PiesPerDay int)
insert into #tmpPies(Name,DateOfBirth,PiesPerDay) values('Dave','15-Aug-1979',3)
insert into #tmpPies(Name,DateOfBirth,PiesPerDay) values('Bob','09-Mar-1981',4)
insert into #tmpPies(Name,DateOfBirth,PiesPerDay) values('Lisa','16-Sep-1986',1)

Answer by Ric Tokyo

I personally use the CASE statement a lot. Here are some links on it, but I also suggest googling.

4 guys from Rolla

Microsoft technet

Quick example:

SELECT FirstName, LastName, Salary, DOB, CASE Gender 
                                            WHEN 'M' THEN 'Male' 
                                            WHEN 'F' THEN 'Female' 
                                         END 
FROM Employees

Answer by Patrick Cuff

I like to use SQL to generate more SQL.

For example, I needed a query to count the number of items across specific categories, where each category is stored in its own table. I used the the following query against the master category table to generate the queries I needed (this is for Oracle):

select 'select '
    || chr(39) || trim(cd.authority) || chr(39) || ', ' 
    || chr(39) || trim(category) || chr(39) || ', '
    || 'count (*) from ' || trim(table_name) || ';'
from   category_table_name ctn
     , category_definition cd
where  ctn.category_id = cd.category_id
and    cd.authority = 'DEFAULT'
and    category in ( 'CATEGORY 1'
                   , 'CATEGORY 2'
                   ...
                   , 'CATEGORY N'
                   )
order by cd.authority
       , category;

This generated a file of SELECT queries that I could then run:

select 'DEFAULT', 'CATEGORY 1', count (*) from TABLE1; 
select 'DEFAULT', 'CATEGORY 2', count (*) from TABLE4; 
...
select 'DEFAULT', 'CATEGORY N', count (*) from TABLE921; 

Answer by Powerlord

Besides normalization (the obvious one), setting my foreign key on update and on delete clauses correctly saves me time, particularly using ON DELETE SET NULL and ON UPDATE CASCADE


Answer by JosephStyons (Jan 28, 2009)

I have found it very useful to interact with the database through views, which can be adjusted without any changes to code (except, of course SQL code).


Answer by Terrapin

When developing pages in ASP.NET that need to utilize a GridView control, I like to craft the query with user-friendly field aliases. That way, I can simply set the GridView.AutoGenerateColumns property to true, and not spend time matching HeaderText properties to columns.

select
    MyDateCol 'The Date',
    MyUserNameCol 'User name'
from MyTable

Answer by Paul Chernoch (Jun 25, 2009)

Date arithmetic and processing drives me crazy. I got this idea from the Data Warehousing Toolkit by Ralph Kimball.

Create a table called CALENDAR that has one record for each day going back as far as you need to go, say from 1900 to 2100. Then index it by several columns - say the day number, day of week, month, year, etc. Add these columns:

ID
DATE
DAY_OF_YEAR
DAY_OF_WEEK
DAY_OF_WEEK_NAME
MONTH
MONTH_NAME
IS_WEEKEND
IS_HOLIDAY
YEAR
QUARTER
FISCAL_YEAR
FISCAL_QUARTER
BEGINNING_OF_WEEK_YEAR
BEGINNING_OF_WEEK_ID
BEGINNING_OF_MONTH_ID
BEGINNING_OF_YEAR_ID
ADD_MONTH
etc.

Add as many columns as are useful to you. What does this buy you? You can use this approach in any database and not worry about the DATE function syntax. You can find missing dates in data by using outer joins. You can define multi-national holiday schemes. You can work in fiscal and calendar years equally well. You can do ETL that converts from words to dates with ease. The host of time-series related queries that this simplifies is incredible.


Answer by Binoj Antony

In SQLΒ Server 2005/2008 to show row numbers in a SELECT query result

SELECT ( ROW_NUMBER() OVER (ORDER BY OrderId) ) AS RowNumber,
        GrandTotal, CustomerId, PurchaseDate
FROM Orders

ORDER BY is a compulsory clause. The OVER() clause tells the SQL engine to sort data on the specified column (in this case OrderId) and assign numbers as per the sort results.


Answer by J. Polfer

The two biggest things I found were helpful were doing recursive queries in Oracle using the CONNECT BY syntax. This saves trying to write a tool to do the query for you. That, and using the new windowing functions to perform various calculations over groups of data.

Recursive Hierarchical Query Example (note: only works with Oracle; you can do something similar in other databases that support recursive SQL, cf. book I mention below):

Assume you have a table, testtree, in a database that manages Quality Assurance efforts for a software product you are developing, that has categories and tests attached to those categories:

CREATE TABLE testtree(
   id INTEGER PRIMARY KEY,
   parentid  INTEGER FOREIGN KEY REFERENCES testtree(id),
   categoryname STRING,
   testlocation FILEPATH);

Example Data in table:
id|parentid|categoryname|testlocation
-------------------------------------
00|NULL|ROOT|NULL
01|00|Frobjit 1.0|NULL
02|01|Regression|NULL
03|02|test1 - startup tests|/src/frobjit/unit_tests/startup.test
04|02|test2 - closing tests|/src/frobjit/unit_tests/closing.test
05|02|test3 - functionality test|/src/frobjit/unit_tests/functionality.test
06|01|Functional|NULL
07|06|Master Grand Functional Test Plan|/src/frobjit/unit_tests/grand.test
08|00|Whirlgig 2.5|NULL
09|08|Functional|NULL
10|09|functional-test-1|/src/whirlgig/unit_tests/test1.test
(...)

I hope you get the idea of what's going on in the above snippet. Basically, there is a tree structure being described in the above database; you have a root node, with a Frobjit 1.0 and Whirlgig 2.5 node being described beneath it, with Regression and Functional nodes beneath Frobjit, and a Functional node beneath Whirlgig, all the way down to the leaf nodes, which contain filepaths to unit tests.

Suppose you want to get the filepaths of all unit tests for Frobjit 1.0. To query on this database, use the following query in Oracle:

SELECT testlocation
   FROM testtree
START WITH categoryname = 'Frobjit 1.0'
CONNECT BY PRIOR id=parentid;

A good book that explains a LOT of techniques to reduce programming time is Anthony Mollinaro's SQL Cookbook.


Answer by EvilTeach (Jan 29, 2009)

In some of my older code, I issue a SELECT COUNT(*) in order to see how many rows there are, so that we can allocate enough memory to load the entire result set. Next we do a query to select the actual data.

One day it hit me.

WITH 
base AS
(
    SELECT COL1, COL2, COL3
    FROM SOME-TABLE
    WHERE SOME-CONDITION
)
SELECT COUNT(*), COL1, COL2, COL3
FROM base;

That gives me the number of rows, on the first row (and all the rest).

So I can read the first row, allocate the array, then store the first row, then load the rest in a loop.

One query, doing the work that two queries did.


Answer by Michael Buen

Knowing the specifics of your RDBMS, so you can write more concise code.

  • concatenate strings without using loops. MSSQL:
    something that can prevent writing loops:
    declare @t varchar(1000000) -- null initially;
    select @t = coalesce(@t + ', ' + name, name) from entities order by name;
    print @t
    alternatively:
    declare @s varchar(1000000)
    set @s = ''
    select @s = @s + name + ', ' from entities order by name
    print substring(@s,1,len(@s)-1)
  • Adding an autonumber field to help ease out deleting duplicate records(leave one copy). PostgreSQL, MSSQL, MySQL:

    http://mssql-to-postgresql.blogspot.com/2007/12/deleting-duplicates-in-postgresql-ms.html

  • Updating table from other table. PostgreSQL, MSSQL, MySQL:

    http://mssql-to-postgresql.blogspot.com/2007/12/updates-in-postgresql-ms-sql-mysql.html

  • getting the most recent row of child table.

    PostgreSQL-specific:

    SELECT DISTINCT ON (c.customer_id) 
    c.customer_id, c.customer_name, o.order_date, o.order_amount, o.order_id 
    FROM customers c LEFT JOIN orders O ON c.customer_id = o.customer_id
    ORDER BY c.customer_id, o.order_date DESC, o.order_id DESC;

    Contrast with other RDBMS which doesn't support DISTINCT ON:

    select 
    c.customer_id, c.customer_name, o.order_date, o.order_amount, o.order_id 
    from customers c
    (
        select customer_id, max(order_date) as recent_date
        from orders 
        group by customer_id
    ) x on x.customer_id = c.customer_id
    left join orders o on o.customer_id = c.customer_id 
    and o.order_date = x.recent_date
    order by c.customer_id
  • Concatenating strings on RDBMS-level(more performant) rather than on client-side:

    http://www.christianmontoya.com/2007/09/14/mysql-group_concat-this-query-is-insane/

    http://mssql-to-postgresql.blogspot.com/2007/12/cool-groupconcat.html

  • Leverage the mappability of boolean to integer:

    MySQL-specific (boolean == int), most concise:

    select entity_id, sum(score > 15)
    from scores
    group by entity_id

    Contrast with PostgreSQL:

    select entity_id, sum((score > 15)::int)
    from scores
    group by entity_id

    Contrast with MSSQL, no first-class boolean, cannot cast to integer, need to perform extra hoops:

    select entity_id, sum(case when score > 15 then 1 else 0 end)
    from scores
    group by entity_id
  • Use generate_series to report gaps in autonumber or missing dates, on next version of PostgreSQL(8.4), there will be generate_series specifically for date:

    select '2009-1-1'::date + n as missing_date 
    from generate_series(0, '2009-1-31'::date - '2009-1-1'::date) as dates(n)
    where '2009-1-1'::date + dates.n not in (select invoice_date from invoice)

Answer by Beska

This doesn't save "programming" time, per se, but sure can save a lot of time in general, if you're looking for a particular stored proc that you don't know the name of, or trying to find all stored procs where something is being modified, etc. A quick query for SQL Server to list stored procs that have a particular string somewhere within them.

SELECT ROUTINE_NAME, ROUTINE_DEFINITION 
FROM INFORMATION_SCHEMA.ROUTINES 
WHERE ROUTINE_DEFINITION LIKE '%foobar%' 
AND ROUTINE_TYPE='PROCEDURE'

Same for Oracle:

select name, text
from user_source u
where lower(u.text) like '%foobar%'
and type = 'PROCEDURE';

Answer by indigo80

(Very easy trick - this post is that long only because I'm trying to fully explain what's going on. Hope you like it.)

Summary

By passing in optional values you can have the query ignore specific WHERE clauses. This effectively makes that particular clause become a 1=1 statement. Awesome when you're not sure what optional values will be provided.

Details

Instead of writing a lot of similar queries just for different filter combinations, just write one and exploit boolean logic. I use it a lot in conjuction with typed datasets in .NET. For example, let say we have a query like that:

select id, name, age, rank, hometown from .........;

We've created fill/get method that loads all data. Now, when we need to filter for id - we're adding another fill/get method:

select id, name, age, rank, hometown from ..... where id=@id;

Then we need to filter by name and hometown - next method:

select id, name, age, rank, hometown from .... where name=@name and hometown=@hometown;

Suppose now we need to filter for all other columns and their combinations - we quickly end up creating a mess of similar methods, like method for filtering for name and hometown, rank and age, rank and age and name, etc., etc.

One option is to create suitable query programatically, the other, much simpler, is to use one fill/get method that will provide all filtering possibilites:

select id, name, age, rank, hometown from .....
where
(@id = -1 OR id = @id) AND
(@name = '*' OR name = @name OR (@name is null AND name is null)) AND
(@age = -1 OR age = @age OR (@age is null AND age is null)) AND
(@rank = '*' OR rank = @rank OR (@rank is null AND rank is null) AND
(@hometown = '*' OR hometown = @hometown OR (@hometown is null AND hometown is null);

Now we have all possible filterings in one query. Let's say get method name is get_by_filters with signature:

get_by_filters(int id, string name, int? age, string rank, string hometown)

Want to filter just by name?:

get_by_filters(-1,"John",-1,"*","*");

By age and rank where hometown is null?:

get_by_filters(-1, "*", 23, "some rank", null);

etc. etc.

Just one method, one query and all filter combinations. It saved me a lot of time.

One drawback is that you have to "reserve" integer/string for "doesn't matter" filter. But you shouldn't expect an id of value -1 and person with name '*' (of course this is context dependant) so not big problem IMHO.


Edit:

Just to quickly explain the mechanism, let's take a look at first line after where:

 (@id = -1 OR id = @id) AND ...

When parameter @id is set to -1 the query becomes:

(-1 = -1 OR id = -1) AND ...

Thanks to short-circuit boolean logic, the second part of OR is not going to be even tested: -1 = -1 is always true.

If parameter @id was set to, lets sa'y, 77:

(77 = -1 OR id = 77) AND ...

then 77 = -1 is obviously false, so test for column id equal 77 will be performed. Same for other parameters. This is really easy yet powerful.


Answer by BoltBait

Never normalize a database to the point that writing a query becomes near impossible.

Example: Concatenating arbitrary number of rows of strings in mysql (hierarchical query)


Answer by Allan Simonsen

Aliasing tables and joining a table with it self multiple times:

select pf1.PageID, pf1.value as FirstName, pf2.value as LastName
from PageFields pf1, PageFields pf2
where pf1.PageID = 42
and   pf2.PageID = 42
and   pf1.FieldName = 'FirstName'
and   pf2.FieldName = 'LastName'

Edit: If i have the table PageFields with rows:

id | PageID | FieldName | Value 
.. | ...    | ...       | ... 
17 | 42     | LastName  | Dent
.. | ...    | ...       | ... 
23 | 42     | FirstName | Arthur
.. | ...    | ...       | ... 

Then the above SQL would return:

42, 'Arthur', 'Dent'

Answer by Chris Nava

Take advantage of SQL's ability to output not just database data but concatinated text to generate more SQL or even Java code.

  • Generate insert statements
    • select 'insert .... values(' + col1 ... + ')' from persontypes
  • Generate the contents of an Enum from a table.
    • ...
  • Generate java Classes from table names
    • select 'public class ' + name + '{\n}' from sysobjects where...

EDIT: Don't forget that some databases can output XML which saves you lots of time reformatting output for client applications.


Answer by DevinB (Jan 28, 2009)

This doesn't necessarily save you coding time, but this missing indexes query can save you the time of manually figuring out what indexes to create. It is also helpful because it shows actual usage of the indexes, rather than the usage you 'thought' would be common.

http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx


Answer by Scorpi0

[Oracle] How to not explode your rollback segment :

delete
from myTable
where c1 = 'yeah';
commit;

It could never finish if there is too many data to delete...

create table temp_myTable
as
select *
from myTable
where c1 != 'yeah';
drop myTable;
rename temp_myTable to myTable;

Juste recreate index/recompile objects, and you are done !


Answer by le dorfier

Off the top of my head:

  1. Use your editor artistry to make it easy to highlight subsections of a query so you can test them easily in isolation.

  2. Embed test cases in the comments so you can highlight and execute them easily. This is especially handy for stored procedures.

  3. Obviously a really popular technique is getting the folks on StackΒ Overflow to work out the hard ones for you. :) We SQL freaks are real suckers for pop quizzes.


Answer by jimmyorr (Jan 30, 2009)

Tom Kyte's Oracle implementation of MySQL's group_concat aggregate function to create a comma-delimited list:

with data as
     (select job, ename,
             row_number () over (partition by job order by ename) rn,
             count (*) over (partition by job) cnt
        from emp)
    select job, ltrim (sys_connect_by_path (ename, ','), ',') scbp
      from data
     where rn = cnt
start with rn = 1
connect by prior job = job and prior rn = rn - 1
  order by job

see: http://tkyte.blogspot.com/2006/08/evolution.html


Answer by Ralph Lavelle

Using Boolean shortcuts in the filters to avoid what I used to do (with horrible string concatenation before executing the final string) before I knew better. This example is from a search Stored Procedure where the user may or may not enter Customer Firstname and Lastname

    @CustomerFirstName      VarChar(50) = NULL,
    @CustomerLastName       VarChar(50) = NULL,

    SELECT   * (I know, I know)
    FROM     Customer c
    WHERE    ((@CustomerFirstName IS NOT NULL AND 
               c.FirstName = @CustomerFirstName)
             OR @CustomerFirstName IS NULL)
    AND      ((@CustomerLastName IS NOT NULL AND 
               c.LastName = @CustomerLastName)
             OR @CustomerLastName IS NULL)

Answer by Rex Miller (Jan 31, 2009)

Not detailed enough and too far down to win the bounty but...

Did anyone already mention UNPIVOT? It lets you normalize data on the fly from:

Client | 2007 Value | 2008 Value | 2009 Value
---------------------------------------------
Foo         9000000     10000000     12000000
Bar               -     20000000     15000000

To:

Client | Year | Value
-------------------------
Foo      2007    9000000
Foo      2008   10000000
Bar      2008   20000000
Foo      2009   12000000
Bar      2009   15000000

And PIVOT, which pretty much does the opposite.

Those are my big ones in the last few weeks. Additionally, reading Jeff's SQL Server Blog is my best overall means of saving time and/or code vis a vis SQL.


Answer by Max Gontar

1. Hierarchical tree formatting SELECT using CTE (MS SQL 2005)

Say you have some table with hierarchical tree structure (departments on example) and you need to output it in CheckBoxList or in Lable this way:

     Main Department  
      Department 1 
      Department 2
       SubDepartment 1 
      Department 3

Then you can use such query:

WITH Hierarchy(DepartmentID, Name, ParentID, Indent, Type) AS 
( 
  -- First we will take the highest Department (Type = 1)
  SELECT DepartmentID, Name, ParentID, 
  -- We will need this field for correct sorting    
  Name + CONVERT(VARCHAR(MAX), DepartmentID) AS Indent, 
  1 AS Type 
  FROM Departments WHERE Type = 1 
  UNION ALL 
  -- Now we will take the other records in recursion
  SELECT SubDepartment.DepartmentID, SubDepartment.Name, 
  SubDepartment.ParentID, 
  CONVERT(VARCHAR(MAX), Indent) + SubDepartment.Name + CONVERT(VARCHAR(MAX),
  SubDepartment.DepartmentID) AS Indent, ParentDepartment.Type + 1 
  FROM Departments SubDepartment 
  INNER JOIN Hierarchy ParentDepartment ON 
    SubDepartment.ParentID = ParentDepartment.DepartmentID 
) 
-- Final select
SELECT DepartmentID, 
-- Now we need to put some spaces (or any other symbols) to make it 
-- look-like hierarchy
REPLICATE(' ', Type - 1) + Name AS DepartmentName, ParentID, Indent 
FROM Hierarchy 
UNION 
-- Default value
SELECT -1 AS DepartmentID, 'None' AS DepartmentName, -2, ' ' AS Indent 
-- Important to sort by this field to preserve correct Parent-Child hierarchy
ORDER BY Indent ASC

Other samples

Using stored procedure: http://vyaskn.tripod.com/hierarchies_in_sql_server_databases.htm

Plain select for limited nesting level: http://www.sqlteam.com/article/more-trees-hierarchies-in-sql

Another one solution using CTE: http://www.sqlusa.com/bestpractices2005/executiveorgchart/

2. Last Date selection with grouping - using RANK() OVER

Imagine some Events table with ID, User, Date and Description columns. You need to select all last Events for each User. There is no guarantee that Event with higher ID has nearest Date.

What you can do is play around with INNER SELECT, MAX, GROUPING like this:

SELECT E.UserName, E.Description, E.Date 
FROM Events E
INNER JOIN 
(
    SELECT UserName, MAX(Date) AS MaxDate FROM Events
    GROUP BY UserName
) AS EG ON E.Date = EG.MaxDate

But I prefer use RANK OVER:

SELECT EG.UserName, EG.Description, EG.Date  FROM
(
    SELECT RANK() OVER(PARTITION BY UserName ORDER BY Date DESC) AS N, 
        E.UserName, E.Description, E.Date 
    FROM Events E
) AS EG
WHERE EG.N = 1

It's more complicated, but it seems to be more correct for me.

3. Paging using TOP and NOT IN

There is already paging here, but I just can't forget this great experience:

DECLARE @RowNumber INT, @RecordsPerPage INT, @PageNumber INT
SELECT @RecordsPerPage = 6, @PageNumber = 7
SELECT TOP(@RecordsPerPage) *  FROM [TableName] 
WHERE ID NOT IN
(
    SELECT TOP((@PageNumber-1)*@RecordsPerPage) ID 
    FROM [TableName]
    ORDER BY Date ASC
)
ORDER BY Date ASC

4. Set variable values in dynamic SQL with REPLACE

Instead of ugly

SET @SELECT_SQL = 'SELECT * FROM [TableName] 
    WHERE Date < ' + CAST(@Date, VARCHAR) + ' AND Flag = ' + @Flag

It's more easy, safe and readable to use REPLACE:

DECLARE @VAR_SQL VARCHAR(3000), @SELECT_SQL VARCHAR(3000)
DECLARE @Id INT
SET @Id = 3
DECLARE @Flag VARCHAR(1)
SET @Flag = 'X'
DECLARE @Date DATETIME
SET @Date = GETDATE()
SET @VAR_SQL = 
'DECLARE @Date DATETIME 
SET @Date = CAST(:Date AS DATETIME) 
'
SET @SELECT_SQL = 'SELECT * FROM [TableName] 
    WHERE Id > :Id AND Flag = :Flag AND Date < @Date'

SET @SELECT_SQL = 
    REPLACE(@SELECT_SQL, ':Flag', QUOTENAME(CONVERT(VARCHAR, @Flag),''''))
SET @SELECT_SQL = REPLACE(@SELECT_SQL, ':Id', CONVERT(VARCHAR, @Id))
SET @VAR_SQL = 
    REPLACE(@VAR_SQL, ':Date', QUOTENAME(CONVERT(VARCHAR, @Date),''''))

PRINT(@VAR_SQL + @SELECT_SQL)
EXEC(@VAR_SQL + @SELECT_SQL)

5. DROP before CREATE

There are some good practices for writing stored procedures or functions, one of them is to include IF EXISTS ... DROP block in procedure creation script.

IF EXISTS 
(   
    SELECT 1 FROM sysobjects 
    WHERE id = OBJECT_ID(N'[ProcedureName]') 
        AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
DROP PROCEDURE [ProcedureName]
GO

IF EXISTS 
(   
    SELECT 1 FROM sysobjects 
    WHERE id = OBJECT_ID(N'[ScalarFunctionName]') 
        AND OBJECTPROPERTY(id, N'IsScalarFunction') = 1
)
DROP FUNCTION [ScalarFunctionName]
GO

IF EXISTS 
(   
    SELECT 1 FROM sysobjects 
    WHERE id = OBJECT_ID(N'[TableFunctionName]') 
        AND OBJECTPROPERTY(id, N'IsTableFunction') = 1
)
DROP FUNCTION [TableFunctionName]
GO

Talking about temporary tables:

IF OBJECT_ID('tempdb..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #TEMP(ID INT, DATESTART DATETIME, DATEEND DATETIME)

6. Lot of dynamic sql, temp tables, and others on Erland Sommarskog's home page


Answer by Peter Mortensen

SQL's Pivot command (PDF). Learn it. Live it.


Answer by aekeus (Jan 28, 2009)

There are a few things that can be done to minimize the amount of code that needs to be written and insulate you from code changes when the database schema changes (it will).

So, in no particular order:

  1. DRY up your schema - get it into third normal form
  2. DML and Selects can come via views in your client code
    • When your underlying tables changes, update the view
    • Use INSTEAD OF triggers to intercept DML calls to the view - then update the necessary tables
  3. Build an external data dictionary containing the structure of your database - build the DDL from the dictionary. When you change database products, write a new parser to build the DDL for your specific server type.
  4. Use constraints, and check for them in your code. The database that only has one piece of client code interacting with it today, will have two tomorrow (and three the next day).

Answer by Evgeny (Jan 28, 2009)

Using the WITH statement together with ROW_NUMBER function to perform a search and at the same time sort the results by a required field. Consider the following query, for example (it is a part of stored procedure):

    DECLARE @SortResults int;

SELECT @SortResults = 
    CASE @Column WHEN 0 THEN -- sort by Receipt Number
        CASE @SortOrder WHEN 1 THEN 0 -- sort Ascending
                        WHEN 2 THEN 1 -- sort Descending
        END
                WHEN 1 THEN -- sort by Payer Name
        CASE @SortOrder WHEN 1 THEN 2 -- sort Ascending
                        WHEN 2 THEN 3 -- sort Descending
        END
                WHEN 2 THEN -- sort by Date/Time paid
        CASE @SortOrder WHEN 1 THEN 4 -- sort Ascending
                        WHEN 2 THEN 5 -- sort Descending
        END
                WHEN 3 THEN -- sort by Amount
        CASE @SortOrder WHEN 1 THEN 4 -- sort Ascending
                        WHEN 2 THEN 5 -- sort Descending
        END
    END;

    WITH SelectedReceipts AS
    (
        SELECT TOP (@End) Receipt.*,

        CASE @SortResults
            WHEN 0 THEN ROW_NUMBER() OVER (ORDER BY Receipt.ReceiptID)
            WHEN 1 THEN ROW_NUMBER() OVER (ORDER BY Receipt.ReceiptID DESC)
            WHEN 2 THEN ROW_NUMBER() OVER (ORDER BY Receipt.PayerName)
            WHEN 3 THEN ROW_NUMBER() OVER (ORDER BY Receipt.PayerName DESC)
            WHEN 4 THEN ROW_NUMBER() OVER (ORDER BY Receipt.DatePaid)
            WHEN 5 THEN ROW_NUMBER() OVER (ORDER BY Receipt.DatePaid DESC)
            WHEN 6 THEN ROW_NUMBER() OVER (ORDER BY Receipt.ReceiptTotal)
            WHEN 7 THEN ROW_NUMBER() OVER (ORDER BY Receipt.ReceiptTotal DESC)
        END

        AS RowNumber

        FROM Receipt

        WHERE
        ( Receipt.ReceiptID LIKE ''%'' + @SearchString + ''%'' )

        ORDER BY RowNumber
    )

    SELECT * FROM SelectedReceipts
    WHERE RowNumber BETWEEN @Start AND @End

Answer by pablito

Calculating the product of all rows (x1*x2*x3....xn) in one "simple" query

SELECT exp(sum(log(someField)))  FROM Orders

taking advantage of the logarithm properties:

  1. log(x) + log(y) = log(x*y)

  2. exp(log(xy)) = xy

not that I will ever need something like that.......


Answer by geofftnz

Kind of off-topic and subjective, but pick a coding style and stick to it.

It will make your code many times more readable when you have to revisit it. Separate sections of the SQL query into parts. This can make cut-and-paste coding easier because individual clauses are on their own lines. Aligning different parts of join and where clauses makes it easy to see what tables are involved, what their aliases are, what the parameters to the query are...

Before:

select it.ItemTypeName, i.ItemName, count(ti.WTDLTrackedItemID) as ItemCount
from WTDL_ProgrammeOfStudy pos inner join WTDL_StudentUnit su
on su.WTDLProgrammeOfStudyID = pos.WTDLProgrammeOfStudyID inner join
WTDL_StudentUnitAssessment sua on sua.WTDLStudentUnitID = su.WTDLStudentUnitID
inner join WTDL_TrackedItem ti on ti.WTDLStudentUnitAssessmentID = sua.WTDLStudentUnitAssessmentID
inner join WTDL_UnitItem ui on ti.WTDLUnitItemID = ui.WTDLUnitItemID inner
join WTDL_Item i on ui.WTDLItemID = i.WTDLItemID inner join WTDL_ItemType it
on i.WTDLItemTypeID = it.WTDLItemTypeID where it.ItemTypeCode = 'W' and i.ItemName like 'A%'
group by it.ItemTypeName, i.ItemName order by it.ItemTypeName, i.ItemName

After:

select          it.ItemTypeName,
                i.ItemName,
                count(ti.WTDLTrackedItemID) as ItemCount

from            WTDL_ProgrammeOfStudy            pos
inner join      WTDL_StudentUnit                 su        on su.WTDLProgrammeOfStudyID = pos.WTDLProgrammeOfStudyID
inner join      WTDL_StudentUnitAssessment       sua       on sua.WTDLStudentUnitID = su.WTDLStudentUnitID
inner join      WTDL_TrackedItem                 ti        on ti.WTDLStudentUnitAssessmentID = sua.WTDLStudentUnitAssessmentID
inner join      WTDL_UnitItem                    ui        on ti.WTDLUnitItemID = ui.WTDLUnitItemID
inner join      WTDL_Item                        i         on ui.WTDLItemID = i.WTDLItemID
inner join      WTDL_ItemType                    it        on i.WTDLItemTypeID = it.WTDLItemTypeID

where           it.ItemTypeCode         = 'W'
and             i.ItemName              like 'A%'

group by        it.ItemTypeName,
                i.ItemName

order by        it.ItemTypeName,
                i.ItemName

Answer by user59861 (Jan 28, 2009)

Three words... UPDATE FROM WHERE


Answer by GregD

That would be

copy & paste

But in all seriousness, I've gotten in the habit of formatting my code so that lines are much easier to comment out. For instance, I drop all new lines down from their SQL commands and put the comma's at the end instead of where I used to put them (at the beginning). So my code ends up looking like this

Select
    a.deposit_no,
    a.amount
From 
    dbo.bank_tran a
Where
    a.tran_id = '123'

Oh and ALIASING!


Answer by jimmyorr

Analytic functions like rank, dense_rank, or row_number to provide complex ranking.
The following example gives employees a rank in their deptno, based on their salary and hiredate (highest paid, oldest employees):

select e.*,
       rank() over (
                      partition by deptno 
                      order by sal desc, hiredate asc
                   ) rank
from emp e

Answer by Cape Cod Gunny

I wrote a stored procedure called spGenerateUpdateCode. You passed it a tablename or viewname and it generated an entire T-SQL Stored Procedure for updating that table. All I had to do was copy and paste into TextPad (my favorite editor). Do some minor find and replaces and minimal tweaking and BAM... update done.

I would create special views of base tables and call spGenerateUpdateCode when I needed to do a partial updates.

That single 6 hour coding session saved me hundreds of hours.

This proc created two blocks of code. One for inserts and one for updates.


Answer by AJ.

I offer these suggestions, which have helped me:

Stored procedures and views

Use stored procedures to encapsulate complex joins over many tables - both for selects and for updates/inserts. You can also use views where the joins don't involve too many tables. (where "too many" is a vague quantity between 4 and 10).

So, for example, if you want information on a customer, and it's spread over lots of tables, like "customer", "address", "customer status code", "order", "invoice", etc., you could create a stored procedure called "getCustomerFullDetail" which joins all those tables, and your client code can just call that and never have to worry about the table structure.

For updates, you can create "updateCustomerFullDetail", which could apply updates sensibly.

There will be some performance hits for this, and writing the stored procedures might be non-trivial, but you're writing the non-trivial code once, in SQL (which is typically succinct).

Normalisation

Normalise your database.

Really.

This results in cleaner (simpler) update code which is easier to maintain.
It may have other benefits which are not in scope here.

I normalise to at least 4NF.

4NF is useful because in includes making all your lists of possible values explicit, so your code doesn't have to know about, e.g. all possible status codes, so you don't hard-code lists in client code.

(3NF is the one which really sorts out those update anomalies.)

Perhaps use an ORM?

This is as much a question as a suggestion: would a good ORM reduce the amount of code you have to write? Or does it just remove some of the pain from moving data from the database to the client? I haven't played with one enough.


Answer by rp.

Learn T4!

It's a great little tool to have around. Creating templates is a little work at first, but not hard at all once you get the hang of it. I know that in the age of ORMs, the example below is perhaps dated, but you'll get the idea.

See these links for more on T4:

Start here:

Others of interest:

The T4 template

<#@ template language="C#" #>
<#@ output extension="CS" #>
<#@ assembly name="Microsoft.SqlServer.ConnectionInfo" #>
<#@ assembly name="Microsoft.SqlServer.Smo" #>
<#@ import namespace="Microsoft.SqlServer.Management.Smo" #>
<#@ import namespace="System.Collections.Specialized" #>
<#@ import namespace="System.Text" #>

<#
    Server server = new Server( @"DUFF\SQLEXPRESS" );
    Database database = new Database( server, "Desolate" );
    Table table = new Table( database, "ConfirmDetail" );
    table.Refresh();

    WriteInsertSql( table );
#>

<#+
    private void WriteInsertSql( Table table )
    {
        PushIndent( "    " );
        WriteLine( "const string INSERT_SQL = " );
        PushIndent( "    " );
        WriteLine( "@\"INSERT INTO " + table.Name + "( " );

        PushIndent( "    " );
        int count = 0;
        // Table columns.
        foreach ( Column column in table.Columns )
        {
            count++;
            Write( column.Name );
            if ( count < table.Columns.Count ) Write( ",\r\n" );
        }
        WriteLine( " )" );
        PopIndent();

        WriteLine( "values (" );
        PushIndent( "    " );
        count = 0;
        // Table columns.
        foreach ( Column column in table.Columns )
        {
            count++;
            Write( "@" + column.Name );
            if ( count < table.Columns.Count ) Write( ",\r\n" );
        }
        WriteLine( " )\";" );
        PopIndent();
        PopIndent();
        PopIndent();
        WriteLine( "" );
    }
#>

outputs this for any table specfied:

const string INSERT_SQL =
    @"INSERT INTO ConfirmDetail(
        ConfirmNumber,
        LineNumber,
        Quantity,
        UPC,
        Sell,
        Description,
        Pack,
        Size,
        CustomerNumber,
        Weight,
        Ncp,
        DelCode,
        RecordID )
    values (
        @ConfirmNumber,
        @LineNumber,
        @Quantity,
        @UPC,
        @Sell,
        @Description,
        @Pack,
        @Size,
        @CustomerNumber,
        @Weight,
        @Ncp,
        @DelCode,
        @RecordID )";

Answer by Paul W Homer (Jan 28, 2009)

Way back, I wrote dynamic SQL in a C program that took a table as an argument. It would then access the database (Ingres in those days) to check the structure, and using a WHERE clause, load any matching row into a dynamic hash/array table.

From there, I would just lookup the indices to the values as I used them. It was pretty slick, and there was no other SQL code in the source (also it had a feature to be able to load a table directly into a tree).

The code was a bit slower than brute force, but it optimized the overall program because I could quickly do partitioning of the data in the code, instead of in the database.

Paul.


Answer by inspite (Jan 28, 2009)

Make sure you know what SELECT can do.

I used to spend hours writing dumb queries that SQL does out of the box (eg NOT IN and HAVING spring to mind)


Answer by Garry Shutler

What I call the sum case construct. It's a conditional count. A decent example of it is this answer to a question.


Answer by Barry (Feb 05, 2009)

De-dup a table fast and easy. This SQL is Oracle-specific, but can be modified as needed for whatever DB you are using:

DELETE table1 WHERE rowid NOT IN (SELECT MAX(rowid) FROM table1 GROUP BY dup_field)


Answer by Allethrin (Jan 29, 2009)

Derived tables. Example below is simple (and makes more sense as a join), but in more complex cases they can be very handy. Using these means you don't have to insert a temporary result set into a table just to use it in a query.

SELECT   tab1.value1,
         tab2.value1
FROM     mytable tab1,
    (    SELECT id,
                value1 = somevalue
         FROM   anothertable
         WHERE  id2 = 1234 ) tab2
WHERE   tab1.id = tab2.id

Answer by Eric Johnson

Sqsh (pronounced skwish) is short for SQshelL (pronounced s-q-shell), it is intended as a replacement for the venerable 'isql' program supplied by Sybase. It came about due to years of frustration of trying to do real work with a program that was never meant to perform real work.

My favorite feature is that it contains a (somewhat feeble) scripting language which allows a user to source handy functions like this from a .sqshrc config file:

\func -x droptablelike
   select name from sysobjects where name like "${1}" and type = 'U'
   \do
      \echo dropping #1
      drop table #1
      go
   \done
\done

Answer by Bill Karwin (Jan 30, 2009)

Generating SQL to update one table based on the contents of another table.

Some database brands such as MySQL and Microsoft SQL Server support multi-table UPDATE syntax, but this is non-standard SQL and as a result each vendor implements different syntax.

So to make this operation more portable, or when we had to do it years ago before the feature existed in any SQL implementation, you could use this technique.

Say for example you have employees and departments. You keep a count of employees per department as an integer in the departments table (yes this is denormalized, but assume for the moment that it's an important optimization).

As you change the employees of a department through hiring, firing, and transfers, you need to update the count of employees per department. Suppose you don't want to or can't use subqueries.

SELECT 'UPDATE departments SET emp_count = ' || COUNT(e.emp_id) 
  || ' WHERE dept_id = ' || e.dept_id || ';'
FROM employees e
GROUP BY e.dept_id;

The capture the output, which is a collection of SQL UPDATE statements. Run this as an SQL script.

It doesn't have to be a query using GROUP BY, that's just one example.


Answer by mjy (Jan 30, 2009)

The nested set method for storing trees / hierarchical data, as explained in Joe Celko's famous book ("SQL for smarties") and also e.g. here (too long to post here).


Answer by user29439 (Jan 30, 2009)

You simply must love the Tally table approach to looping. No WHILE or CURSOR loops needed. Just build a table and use a join for iterative processing. I use it primarily for parsing data or splitting comma-delimited strings.

This approach saves on both typing and performance.

From Jeff's post, here are some code samples:

--Build the tally table:

IF OBJECT_ID('dbo.Tally') IS NOT NULL
     DROP TABLE dbo.Tally

SELECT TOP 10000 IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
    Master.dbo.SysColumns sc2

ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--Split a CSV column

--Build a table with a CSV column.
CREATE TABLE #Demo (
    PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    CsvColumn VARCHAR(500)
)
INSERT INTO #MyHead 
SELECT '1,5,3,7,8,2'
UNION ALL SELECT '7,2,3,7,1,2,2'
UNION ALL SELECT '4,7,5'
UNION ALL SELECT '1'
UNION ALL SELECT '5'
UNION ALL SELECT '2,6'
UNION ALL SELECT '1,2,3,4,55,6'

SELECT mh.PK,
    SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value
FROM dbo.Tally t
    CROSS JOIN #MyHead mh
WHERE N < LEN(','+mh.CsvColumn+',')
    AND SUBSTRING (','+mh.CsvColumn+',',N,1) = ','

Answer by Timur Fanshteyn (Jan 31, 2009)

Use Excel to generate SQL Queries. This works great when you need to insert, update, delete rows based on a CSV that was provided to you. All you have to do is create the right CONCAT() formula, and then drag it down to create the SQL Script


Answer by Bernard Dy (Feb 03, 2009)

The SQL MERGE command:

In the past developers had to write code to handle situations where in one condition the database does an INSERT but in others (like when the key already exists) they do an UPDATE.

Now databases support the "upsert" operation in SQL, which will take care of some of that logic for you in a more concise fashion. Oracle and SQL Server both call it MERGE. The SQL Server 2008 version is pretty powerful; I think it can also be configured to handle some DELETE operations.


Answer by SAMills (Feb 03, 2009)

It's not specifically a coding trick but indeed a very helpful (and missing) aid to SQL Server Management Studio:

SQL Prompt - Intelligent code completion and layout for MS SQL Server

There are many answers already provided where the outcome was having written snippets in the past that eliminate the need to write the same in the future. I believe Code Completion through intellisense definitely falls into this category. It allows me to concentrate on the logic without worrying so much about the syntax of T-SQL or the schema of the database/table/...


Answer by Quassnoi

Using Oracle hints for a select last effective date query.

For instance, exchange rates for a currenсy change several times a day and there is no regularity in it. Efficient rate for a given moment is the rate published last, but before that moment.

You need to select efficient exchange rate for each transaction from a table:

CREATE TABLE transactions (xid NUMBER, xsum FLOAT, xdate DATE, xcurrency NUMBER);
CREATE TABLE rates (rcurrency NUMBER, rdate DATE, rrate FLOAT);
CREATE UNIQUE INDEX ux_rate_currency_date ON rates (rcurrency, rdate);

SELECT  (
    SELECT  /*+ INDEX_DESC (r ux_rate_currency_date) */
        rrate
    FROM    rates r
    WHERE   r.rcurrency = x.xcurrency
        AND r.rdate <= x.xdate
        AND rownum = 1
    ) AS eff_rate, xsum, date
FROM    transactions x

This is not recommended by Oracle, as you rely on index to enforce SELECT order.

But you cannot pass an argument to a double-nested subquery, and have to do this trick.

P.S. It actually works in a production database.


Answer by Colin Pickard (Feb 17, 2009)

SQL Hacks http://oreilly.com/catalog/covers/0596527993_cat.gif SQL Hacks lives on my desk. It is a compendium of useful SQL tricks.


Answer by Christopher Klein (Mar 20, 2009)

Nice quick little utility script I use for when I need to find an ANYTHING in a SQL object (works on MSSQL 2000 and beyond). Just change the @TEXT

SET NOCOUNT ON

DECLARE @TEXT   VARCHAR(250)
DECLARE @SQL    VARCHAR(250)

SELECT  @TEXT='WhatDoIWantToFind'

CREATE TABLE #results (db VARCHAR(64), objectname VARCHAR(100),xtype VARCHAR(10), definition TEXT)

SELECT @TEXT as 'Search String'
DECLARE #databases CURSOR FOR SELECT NAME FROM master..sysdatabases where dbid>4
    DECLARE @c_dbname varchar(64)   
    OPEN #databases
    FETCH #databases INTO @c_dbname   
    WHILE @@FETCH_STATUS  -1
    BEGIN
        SELECT @SQL = 'INSERT INTO #results '
        SELECT @SQL = @SQL + 'SELECT ''' + @c_dbname + ''' AS db, o.name,o.xtype,m.definition '   
        SELECT @SQL = @SQL + ' FROM '+@c_dbname+'.sys.sql_modules m '   
        SELECT @SQL = @SQL + ' INNER JOIN '+@c_dbname+'..sysobjects o ON m.object_id=o.id'   
        SELECT @SQL = @SQL + ' WHERE [definition] LIKE ''%'+@TEXT+'%'''   
        EXEC(@SQL)
        FETCH #databases INTO @c_dbname
    END
    CLOSE #databases
DEALLOCATE #databases

SELECT * FROM #results order by db, xtype, objectname
DROP TABLE #results

The next one is referred to as an UPSERT. I think in MSSQL 2008 you can use a MERGE command but before that if you had to do something in two parts. So your application sends data back to a stored procedure but you dont necessarily know if you should be updating existing data or inserting NEW data. This does both depending:

DECLARE @Updated TABLE (CodeIdentifier VARCHAR(10))

UPDATE AdminOverride 
SET Type1='CMBS'
OUTPUT inserted.CodeIdentifier INTO @Updated
FROM AdminOverride a 
INNER JOIN ItemTypeSecurity b
      ON a.CodeIdentifier = b.CodeIdentifier

INSERT INTO AdminOverride
SELECT c.CodeIdentifier
      ,Rating=NULL
      ,Key=NULL
      ,IndustryType=NULL
      ,ProductGroup=NULL
      ,Type1='CMBS'
      ,Type2=NULL
      ,SubSectorDescription=NULL
      ,WorkoutDate=NULL
      ,Notes=NULL
      ,EffectiveMaturity=NULL
      ,CreatedDate=GETDATE()
      ,CreatedBy=SUSER_NAME()
      ,ModifiedDate=NULL
      ,ModifiedBy=NULL
FROM dbo.ItemTypeSecurity c 
LEFT JOIN @Updated u
      ON c.CodeIdentifier = u.CodeIdentifier
WHERE u.CodeIdentifier IS NULL 

If it existed, it updated AND created a record in the @Updated table what it updated, the Insert command only happens for records that are NOT in the @Updated.


Answer by ob.

Using variables in the SQL where clause to cut down on conditional logic in your code/database. You can compare your variable's value against some default (0 for int, let's say), and filter only if they're not equal. For example:

SELECT * FROM table AS t
WHERE (@ID = 0 OR t.id = @ID);

If @ID is 0 I'll get back all rows in the table, otherwise it'll filter my results by id.

This technique often comes in handy, especially in search, where you can filter by any number of fields.


Answer by Jean-Francois

If you use MySQL, use Common MySQL Queries.

It really shows a lot of queries that let the database do the job instead of coding multiple queries and doing routine on the result.


Answer by adolf garlic

Red Gate Software's SQL Prompt is very useful.

It has auto completion, code tidy-up, table/stored procedure/view definitions as popup windows, datatype tooltips, etc.


Answer by Frederik

I have had great use of Itzik Ben-Gan's table-valued function fn_nums. It is used to generate a table with a fixed number of integers. Perfect when you need to cross apply a specific number of rows with a single row.

CREATE FUNCTION [dbo].[fn_nums](@max AS BIGINT)RETURNS @retTabl TABLE (rNum INT)
AS
BEGIN 
IF ISNULL(@max,0)<1 SET @max=1;
  WITH
    L0 AS (SELECT 0 AS c UNION ALL SELECT 0),
    L1 AS (SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2 AS (SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3 AS (SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4 AS (SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5 AS (SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B)
  insert into @retTabl(rNum)
  SELECT TOP(@max) ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n 
  FROM L5;
RETURN
END

Answer by crosenblum

Information schema, pure and simple.

I just had to write a small application to delete all data with tables or columns named x or y.

Then I looped that in ColdFusion and created what would have taken 20-30 lines in five lines.

It purely rocks.


Answer by jimmyorr

Combining aggregates with case statements (here with a pivot!):

select job,
       sum(case when deptno = 10 then 1 end) dept10,
       sum(case when deptno = 20 then 1 end) dept20,
       sum(case when deptno = 30 then 1 end) dept30
  from emp
 group by job

Shared with attribution, where reasonably possible, per the SO attribution policy and cc-by-something. If you were the author of something I posted here, and want that portion removed, just let me know.

↧

Do we really need cameras inside our homes? (Spoiler: NO!)

$
0
0
Do we really need cameras inside our homes? (Spoiler: NO!)

Few things make a parent as angry, on some deep primal level, as someone else screwing with their children. It doesn't matter who it is - some bullying kid, a critical grandparent, or a script kiddie with too much disposable time and money.

The news outlets covered how some Ring cameras, or more accurately some Ring camera accounts, got hacked. Or even more accurately, how people keep reusing their passwords and so it's highly likely some other site got hacked and then someone tried those passwords with Ring and got lucky.

The msm outlets weren't overly informative, but I found the following vice article to be much more helpful. In short, use a password manager like 1Password or LastPass, stop reusing passwords, and enable 2FA for your Ring account. It should be a requirement, but ease-of-use trumps genuine security.

How Hackers Are Breaking Into Ring Cameras
After a hacker broke into a Ring camera in Tennessee and spoke to a child, Motherboard found hackers have made dedicated software for gaining access.
Do we really need cameras inside our homes? (Spoiler: NO!)

Technology gets hacked - that's reality now. But it raises an important question in my mind - do we really need cameras in our homes? Every living area? Our kids bedrooms?? (The same goes for anything that listens in, 24/7.) In an ideal world, of course, the answer is "yes".. there's absolutely nothing creepy about that. πŸ˜’

Okay, as a parent, I don't doubt that most of them are trying to do what they think is best - making sure their kids are safe at home. Maybe they get home from school before the parents do. Maybe someone has a disability and could hurt themselves. In every case though, you'd need to be glued to the monitor at all times, and call someone else for help. (Something like Life Alert makes a lot more sense.)

As a techie though, I find these parents far too naive about how modern technology actually works. We're spoonfed ridiculous ideas like Ironman's magic science lab and Batman's superduper cellphone snooper. That's fun on the big screen, but reality is so much more disappointing.

Modern tech is a tangle of wires and circuits and buggy software, layers and layers developed over decades, duct taped together into something that hopefully works most of the time. The more complicated or cutting-edge the technology is, the more attractive (and often easier) a target it makes.

Hackers Remotely Kill a Jeep on the Highwayβ€”With Me in It
I was driving 70 mph on the edge of downtown St. Louis when the exploit began to take hold.
Do we really need cameras inside our homes? (Spoiler: NO!)

The more I learn to deal with and fix complicated technology, the less prone I am to buy even more complicated technology... unless the potential headache is really worth the benefits. Here's a few thoughts I'll leave you with:

  1. The more complicated something is, the more points of failure it has. And it will fail you eventually.
  2. If you can control a device through the Internet, anyone can given enough time and effort. If you're not careful, like reusing weak passwords, that effort may be really minimal.
  3. If we survived without <insert latest tech> for the last several hundred millennia, do we really need it now? Really?

But specifically about that camera in the bedroom thing, in case my opinion wasn't clear... (it's important to be clear)

Do we really need cameras inside our homes? (Spoiler: NO!)
↧
↧

Hands-on Ansible, using two DigitalOcean Ubuntu droplets

$
0
0
Hands-on Ansible, using two DigitalOcean Ubuntu droplets

A few weeks ago, I took my first look at Docker and then followed it up with a slightly more technical look at how layers work. For the uninitiated, Docker allows you to build vm's in a predictable, repeatable manner as a series of layers called images. Automation is where it's at - if you think you'll have to deploy a box several times, your future self will thank you for scripting it out. If you're interested, check out my posts for an okayish intro (I hope to write more).

This week, though, I'm wrapping my head around another tool for building machines called Ansible. Note that Ansible is not an alternative for Docker, but it can actually complement it. I'll post some resources later, but right now I'm just stepping through a tutorial I found on DigitalOcean. But first...

Create two basic Ubuntu VMs

Create a DigitalOcean account and spin up two Ubuntu droplets (the green "create" button in the upper-right). A bottom-tier machine runs $5/mo, so even if you play with these for the rest of the day it'll only run ya 33Β’. πŸ€‘

Normally I'd leave "SSH keys" selected for authentication, but for now you can just select "one-time password". You'll get an email for each machine with a temp password, and then you can just open a terminal, type in ssh root@111.111.111.111 using whatever IP address you're assigned, and change the password.

Install Ansible on one of them

After you've logged into both machines, follow along with this tutorial. Pick one machine to be the "controller node", where you'll install Ansible. The other machine will be the "host" that the controller node will eventually send commands to. Everything is in the tutorial.

How to Install and Configure Ansible on Ubuntu 18.04 | DigitalOcean
Configuration management systems are designed to make controlling large numbers of servers easy for administrators and operations teams. They allow you to control many different systems in an automated way from one central location. In this guide, we will discuss how to install Ansible on an Ubuntu…
Hands-on Ansible, using two DigitalOcean Ubuntu droplets

Setup the inventory (hosts file)

Hands-on Ansible, using two DigitalOcean Ubuntu droplets
After installing Ansible, I setup the /etc/ansible/hosts file...
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
... and then verified it with the ansible-inventory command

Create an SSH key on the controller node

You'll need to create an SSH keypair on the same machine where you installed Ansible (the controller node). Just type ssh-keygen, accept all the defaults, then use ssh-copy-id to copy the public key you just created to the other machine (the host). That allows the node controller to communicate with the host.

How to Set Up SSH Keys on Ubuntu 18.04 | DigitalOcean
SSH-key-based authentication provides a more secure alternative to password-based authentication. In this tutorial we’ll learn how to set up SSH key-based authentication on an Ubuntu 18.04 installation.
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
Follow step 1 and step 2, both from the controller node

Here's some output from my node controller, as I was running commands. I color-coded it to make it easier to understand, but basically...

  • I tried pinging the host, which failed because SSH wasn't setup yet. (red)
  • I created an SSH keypair on the controller node. (green)
  • I verified that the keypair was created, and id_rsa.pub was present. (purple)
  • I copied the public key from the node controller to the host. (orange)
  • I ran the first command again, to ping the host. Success! (blue)
Hands-on Ansible, using two DigitalOcean Ubuntu droplets

Verify that you can run Ansible commands

The authors of the tutorial suggest running the following command from the controller node, just to see that you can run commands against the host(s) you setup - although the ping command above already did that.

ansible all -a "df -h" -u root
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
Checking host disk usage locally, and remotely from the controller
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
Checking the host date from the controller, before and after changing the host timezone
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
Checking the uptime on a host machine

What's next?

Okay, that wasn't nearly as bad as I thought it'd be! If you were doing this in a production environment, you'd want to do way more - creating a non-root sudo user and configuring UFW to allow only the ports you need (like 22) come to mind.

Now that I've got the servers setup and communicating, I plan on going through the rest of Erika's guides. I'll save these for another day though.

How to Use Ansible: An Ansible Cheat Sheet Guide | DigitalOcean
Ansible is a modern configuration management tool that facilitates the task of setting up and maintaining remote servers. This cheat sheet-style guide provides a quick reference to commands and practices commonly used when working with Ansible.
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
How to Use Ansible to Automate Initial Server Setup on Ubuntu | DigitalOcean
Ansible offers a simple architecture that doesn’t require special software to be installed on nodes. It also provides a robust set of features and built-in modules which facilitate writing automation scripts. This guide explains how to use Ansible to automate the steps contained in our Initial Serve…
Hands-on Ansible, using two DigitalOcean Ubuntu droplets
Configuration Management 101: Writing Ansible Playbooks | DigitalOcean
This tutorial will walk you through the process of creating an automated server provisioning using Ansible, a configuration management tool that provides a complete automation framework and orchestration capabilities. We will focus on the language terminology, syntax and features necessary for creat…
Hands-on Ansible, using two DigitalOcean Ubuntu droplets

Other Resources

I said I'd post other resources, and I don't want to break such an important promise. So.. here's the official ansible docs. I find most of the posts on DO to be of high quality, but I'm not sure anyone's written guides for other flavors of Unix. If you're not using Ubuntu, the docs have steps for quite a few other systems, so check them out.

I found a couple Lynda.com videos that look promising. They partner with a lot of libraries (including mine), so you may be able to access these for free. Check with your library - it's amazing what your tax dollars are already paying for. πŸ˜…

Learning Ansible, Jesse Keating (March 2017)
Ansible Essential Training, Robert Starmer (March 2018)

If you have access to a Percipio account, I found the courses created by Joseph Khoury last year to be pretty easy to understand. I have access to it through my workplace, but I don't know if you can access it as an individual like Pluralsight et al.

And of course there's YouTube, a popular video streaming site that you may not have heard of, if you were frozen 15 years ago and just thawed out today.

A quick overview of what Ansible is, and of the GUI tool that complements it
An almost too-quick overview, but if you ran through the DO tutorial it'll make more sense
Part 2 of the above tutorial (there are 5 in all) is still pretty beginner, and something you could try yourself if you followed what I did and have an environment setup to test it
↧

SO Vault: Break statements in the real world

$
0
0
SO Vault: Break statements in the real world

StackOverflow sees quite a few threads deleted, usually for good reasons. Among the stinkers, though, lies the occasionally useful or otherwise interesting one, deleted by some pedantic nitpicker - so I resurrect them. πŸ‘»

Note: Because these threads are older, info may be outdated and links may be dead. Feel free to contact me, but I may not update them... this is an archive after all.


Break statements In the real world

Question asked by Lodle

Been having a discussion on whirlpool about using break statements in for loops. I have been taught and also read elsewhere that break statements should only be used with switch statements and with while loops on rare occasions.

My understanding is that you should only use for loops when you know the number of times that you want to loop, for example do work on x elements in an array, and while loops should be used every other time. Thus a for loop with a break can be easily refactored into a while loop with a condition.

At my university, you will instantly fail an assignment if you use break anywhere but in a switch statement as it breaks the coding guideline of the university. As I'm still completing my software engineering degree I would like to know from people in the real world.

Comments

it would depend how you use it. In my opinion those who say " never use" are wrong. Even goto statement has its uses. – Anycorn Apr 17 '10 at 3:31

@DR Well not goto. You're kind of crossing the line there. Goto is more like horseradish sauce - hardly at all if any. – bobobobo Jun 19 '10 at 17:24

I find it hard to believe there's a real university which enforces uniform coding standards on all courses. – shoosh Jun 28 '10 at 15:37

Unfortunately, all too often the university isn't anything like the real world. – Loren Pechtel Jul 28 '10 at 4:47

your uni lecturers sound like academic muppets. I recall my uni lecturers. None of them could code for shit. Those who can, do, those who can't - teach! – user206705 Nov 17 '10 at 17:40


Answer by paxdiablo

These generalized rules are rubbish as far as I'm concerned. Use what the language allows in the real world as long as it aids (or doesn't degrade) readability. The guideline against using break is no different to that against using goto. The reason behind people not liking them is because it may lead to spaghetti code, hard to follow.

Note the use of two phrases in that sentence above: The first was "guideline" instead of rule - the only rules are those imposed by the standards. Guidelines are for best practices but you have to understand the reasons behind them, not just follow them blindly.

The second was "may lead to" rather than "does lead to". There are situations where break and its brethren actually lead to more readable code than the alternative (which is often a hugely ugly condition in the looping statement).

For example, they make a lot of sense in finite state machines.

As some have pointed out, break can lead to post-conditions of a loop being variable. By that, I mean that:

for (i = 0; i < 50; i++) {
    if (someCondition) {
        break;
    }
}

can lead to i holding an indeterminate value after the loop.

But you should keep in mind that only matter if you actually care what i is set to after the loop. If the next statement is:

for (i = 0; i < 50; i++) { ... }

then it doesn't matter at all.

A piece of code like:

while (x != 0) {
    y = doSomethingWith (x);
    if (y == 0) break;
    process (y);

    z = doSomethingElseWith (x);
    if (z == 0) break;
    process (z);

    x--;
}

violates this guideline and can be refactored into something that doesn't but there is nothing unreadable about this piece of code. You can clearly see all flows of control at a single glance.

You should use the language features that make sense to your situation. Where guidelines should be enforced and where they should be ignored comes with experience.


Answer by Norman Ramsey (Apr 17, 2010)

I've been told by professors and peers at my university that using the break statement is bad practice

Come visit Tufts and our professors will tell you otherwise.

The arguments against break boil down to one principle: break requires non-local reasoning, and a language with break requires a much more complicated semantic framework than a language without break. (For the experts in the room, instead of using simple tools like predicate transformers or Hoare logic, you have to reach for something like continuations, or at the very least, a context semantics.)

The problem with this argument is that it puts simplicity of semantics ahead of programmers' real needs. There are lots of programs with natural loops that have more than one exit. Programming languages need to support these loops in a way that is more effective than introducing extra Boolean variables to govern the control flow.

For some expert testimony on the value of multiple exits from control-flow constructs, I recommend two papers:

  • Structured Programming With goto Statements by Donald E. Knuth. Don goes to great length to explain why certain kinds of gotos should be allowed in Pascal. Most of these gotos are equivalent to some form of break, which hadn't quite been invented yet when Don wrote the paper.

  • Exceptional Syntax by Nick Benton and Andrew Kennedy. The topic may seem unrelated, but throwing an exception is a nonlocal exit, just like break. (In Modula-3, break was defined to be an exception.) It's a great paper showing how language designers need to be more imaginative in designing syntax to support multiple exits.

If you really want to annoy your professors, ask them if the return statement is bad practice. If they say "no", you've got them: "But isn't return a control operator, just like break? And isn't it the case that introducing return intro a structured program creates all the same semantic difficulties that introducing break does?" Watch them squirm.

Is using the break statement bad practice?

No. The break statement is a valuable tool in your toolbox, just like return or exceptions. Like other tools, it can be misused, but there is nothing inherently bad about it, and in fact the break statement is pretty easy to use in sane and sensible ways.

Your professors should learn some more powerful semantic methods that can tame the break statement.


Answer by Joel (Apr 17, 2010)

This comes from the idea that there should be one way IN a method and one way OUT. Same with loops. I've had some instructors tell me that I shouldn't use more than one return or any break/continue because it creates "spaggetti code" and it's hard to follow the path. Instead, they say to set a flag and use an if statement rather than just break out. I completely disagree with this idea. I think in a lot of cases having more than one return or a break/continue statement is much more readable and easier to follow.


Answer by Stephen C

I've been told by professors and peers at my university that using the break statement is bad practice

The first thing to realize is that many of those people have never actually been professional software engineers, and never had to work on a large code base written by many developers over many years. If you do this, you learn that simplicity, clarity, consistency and use of accepted idioms are more important in making code maintainable than dogma like avoiding break/continue/multiple return.

I personally have no problems reading and understanding code that uses break to get out of loops. The cases where I find a break unclear tend to be cases where the code needs to be refactored; e.g. methods with high cyclomatic complexity scores.

Having said that, your professors have the right motivation. That is, they are trying to instill in you the importance of writing clear code. I hope they are also teaching you about the importance of consistent indentation, consistent line breaking, consistent white space around operators, identifier case rules, meaningful identifiers, comments and so on ... all of which are important to making your code maintainable.


Answer by Greg (Oct 19, 2008)

I don't see any harm in using break - it's useful and simple. The exception is when you have a lot of messy code inside your loop, it can be easy to miss a break tucked away in 4 levels of ifs, but in this case you should probably be thinking about refactoring anyway.

Edit: IMHO it's much more common to see break in a while than a for (although seeing continue in a for is pretty common) but that doesn't mean it's bad to have one in a for.


Answer by Greg B

I think it's a completely pompous and ridiculous rule to enforce.

I often use break within a for loop. If i'm searching for something in an array and don't need to keep searching once I find it, I will break out of that loop.

I agree with @Konrad Rudolph above, that any and all features should be used as and when the developer sees fit.

In my eye, a for loop is more obvious at a glance than a while. I will use a for over a while any day unless a while is specifically needed. And I will break from that for if logic requires it.


Answer by Richard Harrison (Oct 19, 2008)

My rule is to use any and all features of the language where it doesn't produce obscure or unreadable code.

So yes, I do on occasion use break, goto, continue


Answer by Rob Walker (Oct 19, 2008)

I often use break inside a for loop.

The advantage of a for loop is that the iterator variable is scoped within the expression. If a language feature results in less lines of code, or even less indented code then IMHO it is generally a good thing and should be used to improve readability.

e.g.

for (ListIt it = ...; it.Valid(); it++)
{
  if (it.Curr() == ...)
  {
     .. process ...
     break;
   }
}

Rewriting this using a for loop would require several more lines, and leak the iterator out of the scope of the loop.

(Pedantic points: I only want to act on the first match, and the condition being evaluated isn't suitable for any Find(...) method the list has).


Answer by Cervo (Oct 19, 2008)

Break is useful for avoiding nesting. Also there are many times that it is useful to prematurely exit a loop. It also depends on the languages. In languages like C and Java a for loop basically is a while loop with an initialization and increment expression.

is it better to do the following (assume no short circuit evaluation)

list = iterator on something
while list.hasItem()
  item = list.next()
  if item passes check
      if item passes other check
            do some stuff
            if item passes other check
                  do some more stuff
                  if item is not item indicating end of list
                        do some more stuff
                  end if
            end if
       end if
   end if
end while

or is it better just to say

while list.hasItem()
     item = list.next()
     if check fails continue
       .....
     if checkn fails continue
     do some stuff
     if end of list item checks break
end while

For me it is better to keep the nesting down and break/continue offer good ways to do that. This is just like a function that returns multiple times. You didn't mention anything about continue, but in my opinion break and continue are of the same family. They help you to manually change loop control and are great at helping to save nesting.

Another common pattern (I actually see this in university classes all the time for reading files and breaking apart strings) is

currentValue = some function with arguments to get value
while (currentValue != badValue) {
    do something with currentValue
    currentValue = some function with arguments to get value
}

is not as good as
while (1) {
    currentValue = some function with arguments to get value
    if (currentValue == badValue)
       break
    do something with currentValue
}

The problem is that you are calling the function with arguments to create currentValue twice. You have to remember to keep both calls in sync. If you change the arguments for one but not the other you introduce a bug. You mention you are getting a degree in software engineering, so I would think there would be emphasis on not repeating yourself and creating easier to maintain code.

Basically anyone who says any control structure is bad and completely bans it is being closed minded. Most structures have a use. The biggest example is GOTO. A lot of people abused it and jumped in the middle of other sub procedures, and basically jumped forwards/backwards all over the code and gave it a bad name. But GOTO has its uses. Using GOTO to exit a loop early was a good use, now you have break. Using GOTO to centralize exception handling was another good use. Now you have try/catch exception handling in many languages. In assembly there is only GOTO for the most part. And using that you can create a disaster. Or you can create our "structured" programming structures. In truth I generally don't use GOTO except in excel VBA because there is no equivalent to continue (that I know of) and error handling code in VB 6 utilizes goto. But I still would not absolutely dismiss the control structure and say never...

Unfortunately the reality is that if you don't want to fail, you will have to avoid using break. It is unfortunate that university doesn't have more open minded people in it. To keep the level of nesting down you can use a status variable.

status variable = true
while condition and status variable = true
  do stuff
  if some test fails
    status variable = false
  if status variable = true
     do stuff
  if some test fails
     status variable = false
  ....
end while

That way you don't end up with huge nesting.


Answer by Duck (Apr 17, 2010)

Makes switch statements a whole lot easier.


Answer by Sol (Oct 19, 2008)

I would argue your teachers' prohibition is just plain poor style. They are arguing that iterating through a structure is a fundamentally different operation than iterating through the same structure but maybe stopping early, and thus should be coded in a completely different way. That's nuts; all it's going to do is make your program harder to understand by using two different control structures to do essentially the same thing.

Furthermore, in general avoiding breaks will make your program more complicated and/or redundant. Consider code like this:

for (int i = 0; i < 10; i++)
{
   // do something
   if (check on i) 
        break;
   // maybe do something else
}

To eliminate the break, you either need to add an additional control boolean to signal it is time to finish the loop, or redundantly check the break condition twice, once in the body of the loop and once in the loop's control statement. Both make the loop harder to understand and introduce more opportunities for bugs without buying you any additional functionality or expressiveness. (You also need to hoist the declaration of i out of the loop's control structure, adding another scope around the entire mess.)

If the loop is so big you cannot easily follow the action of the break statement, then you'd be better off refactoring the loop than adding to its complexity by removing the break statement.


Answer by stakx

Edit: First of all, sorry if this answer seems somewhat long-winded. However, I'd like to demonstrate where my expressed opinion about the break statement (bold text at the end of my answer) comes from.


One very good use case for the break statement is the following. In loops, you can usually check a break condition in either of two places: Either at the loop's beginning, or at the loop's end:

while (!someBreakCondition)
{
    ...
}

do
{
    ...
} while (!someBreakCondition)

Now, what do you do when, for some reason, you cannot put your break condition in either of these places, because e.g. you first need to retrieve a value from somewhere before you can compare it to some criterion?

// assume the following to be e.g. some API function that you cannot change:
void getSomeValues(int& valueA, int& valueB, int& valueC)
//                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//                           out parameters
{
    ...
}

while (true)
{
    int a, b, c;
    getSomeValues(a, b, c);
    bool someBreakCondition = (b = ...);

    ...   // <-- we do some stuff here
}

In the above example, you could not check for b directly inside the while header, because it must first be retrieved from somewhere, and the way to do that is via a void function. No way to put that into the while break condition.

But it may also be too late to wait until the end of the loop for checking the condition, because you might not want to execute all the code in the loop body:

do
{
    int a, b, c;
    getSomeValues(a, b, c);
    bool someBreakCondition = (b = ...);

    ...   // <-- we want to do some stuff here, but only if !breakCondition
} while (!someBreakCondition);

In this situation, there's two approaches how to avoid executing the ... when someBreakCondition is true:

1) without using break

    ...  // as above

    if (!someBreakCondition)
    {
        ...  // <-- do stuff here
    }
} while (!someBreakCondition);

2) with break

    ...  // as above
    if (someBreakCondition) break;

    ...  // <-- do stuff here
} while (true);

Your professor would obviously favour option 1) (no break). I personally think that's not a nice option, because you have to check the break condition in two places. Option 2) (use break) resolves that problem. Admittedly though, option 1) has the advantage that it becomes more easily recognisable through code indentation under what condition the ... code will execute; with option 2), you have to go up in your source code to find out that program execution might never actually get to some place further down.

In conclusion, I think using or not using break in this (quite frequent) scenario it really comes down to personal preference about code legibility.


Answer by Jon Ericson (Oct 21, 2008)

break is an extremely valuable optimization tool and is especially useful in a for loop. For instance:

-- Lua code for finding prime numbers
function check_prime (x)
   local max = x^0.5;

   for v in pairs(p) do
      if v > max then break end;

      if x%v==0 then 
         return false
      end 
   end
   p[x] = true;
   return x 
end

In this case, it isn't practical to set up the for loop to terminate at the right moment. It is possible to re-write it as a while loop, but that would be awkward and doesn't really buy us anything in terms of speed or clarity. Note that the function would work perfectly well with out the break, but it would also be much less efficient.

The huge advantage of using break rather than refactoring into a while loop is that the edge cases are moved to a less important location in the code. In other words, the main condition for breaking out of a loop should be the only condition to avoid confusion. Multiple conditions are hard for a human to parse. Even in a while loop, I'd consider using break in order to reduce the number of break conditions to just one.


I'm aware that this is not the most efficient prime checker, but it illustrates a case where break really helps both performance and readability. I have non-toy code that would illustrate it, but require more background information to set it up.


Answer by Konrad Rudolph (Oct 19, 2008)

If it's a guideline that's enforced by the corrector, you don't really have the choice.

But as guidelines go, this one seems to be excessive. I understand the rationale behind it. Other people argue (in the same vein) that functions must only have one single exit point. This may be helpful because it can reduce control flow complexity. However, it can also greatly increase it.

The same is true for break in for loops. Since all loop statements are basically idempotent, one kind can always be substituted for any other. But just because you can doesn't mean that this is good. The most important coding guideline should always be β€œuse your brain!”


Answer by Stefan RΓ₯dstrΓΆm (Oct 19, 2008)

In a way I can understand the professor's point of view in this matter, but only as a way to teach the students how to solve problems in a (some kind of) standard fashion. Learn to master these rules, then you are free to break against them as you wish, if that will make the code easier to understand, more effective, or whatever.


Answer by Ken Paul (Oct 21, 2008)

  1. While in school, follow the defined guidelines. Some guidelines are arbitrary, and exist primarily for consistency, ease of grading, or to keep within the teacher's limited understanding. The best balance between maximizing learning and maximizing grades is to follow the guidelines.
  2. In the real world, the balance shifts to maximizing benefit for your employer. This usually requires a focus on readability, maintainability and performance. Since programmers rarely agree on what maximizes these qualities, employers typically attempt to enforce even more arbitrary guidelines. Here the stakes are keeping your job and possibly climbing to a leadership position where you can actually influence the standards.

Answer by Vivin Paliath (Apr 17, 2010)

I tend to shy away from breaks. Most of the time, I've found that a flag is sufficient. Of course, this doesn't mean that breaks are always bad. They do have their uses. Your teachers and peers are right about it being a "get out of jail free" card. This is because breaks, like gotos, have a very high potential for abuse. It's often easier to break out of a loop than to figure out how to structure the loop correctly. In most cases, if you think about your algorithm and logic, you will find that you do not need a break.

Of course, if your loop ends up having a whole bunch of exit conditions, then you need to either rethink your algorithm or use breaks. I tend to really sit and think about my algorithms before deciding to use a break. When you write code and you get that voice in your head that says There has to be a better way than this!, you know it's time to rewrite your algorithm.

As far as loops are concerned, I use for loops when want to iterate over a collection of items with a known bound and I know that there are no other exit conditions. I use a while loop when I want to run a series of statements over and over again until something happens. I usually use while loops if I am searching for something. In this case, I don't break out of the while. Instead, I use a flag because I think the while loop in its entirety reads more like English and I can tell what it's doing. For example:

int i = 0;
while(i < size && !found) {
   found = (value == items[i]);
   i++;
}

The way I read that in English in my head is While i is lesser than the total count of items and nothing is found. That, versus:

int i = 0;
for(int i = 0; i < count; i++) {
   if(value == items[i]) {
      break;
   }
}

I actually find that a bit harder to read because I can't tell immediately from the first line what the loop is actually doing. When I see a for loop, I think Ok, I'm running over a known list of items no matter what. But then I see an if in that loop and then a break inside the if block. What this means is that your loop has two exit conditions, and a while would be a better choice. That being said, don't do this either:

int i = 0;
while(i < count) {
   if(value == items[i]) {
      break;
   }
   i++;
}

That's not much better than the for with the break. To sum it all up, I'd say use break as a last resort, and only if you are sure that it actually will make your code easier to read.


Answer by Steve Jessop (Apr 17, 2010)

Those who claim that it is bad say that it's a "get out of jail free card" and that you can always avoid using it.

They may claim that, but if that's their actual argument then it's nonsense. I'm not in jail, and as a programmer I'm not in the business of avoiding things just because they can be avoided. I avoid things if they're harmful to some desired property of my program.

There's a lot of good discussion here, but I suggest that your professors are not idiots (even if they're wrong), and they probably have some reason in mind when they say not to use break. It's probably better to ask them what this is, and pay attention to the answer. You've given your "guess", but I propose to you that they would state their case better than you state it. If you want to learn from them, ask them to explain the exact details, purpose, and benefits of their coding guideline. If you don't want to learn from them, quit school and get a job ;-)

Admittedly, I don't agree with their case as you've stated it, and neither do many others here. But there's no great challenge in knocking down your straw-man version of their argument.


Answer by Jonah (Jun 26, 2010)

Here is a good use-case in PHP:

foreach($beers as $beer) {
    $me->drink($beer);

    if ($me->passedOut) {
        break; //stop drinking beers
    }
}

Answer by OregonGhost (Oct 19, 2008)

I also learned at uni that any functions should have only a single point of exit, and the same of course for any loops. This is called structured programming, and I was taught that a program must be writable as a structogram because then it's a good design.

But every single program (and every single structogram) I saw in that time during lectures was ugly, hardly readable, complex and error-prone. The same applies to most loops I saw in those programs. Use it if your coding guidelines require it, but in the real world, it's not really bad style to use a break, multiple returns or even continue. Goto has seen much more religious wars than break.


Answer by Jay Bazuzi (Oct 19, 2008)

Most of the uses of break are about stopping when you've find an item that matches a criteria. If you're using C#, you can step back and write your code with a little more intent and a little less mechanism.

When loops like this:

foreach (var x in MySequence)
{
    if (SomeCritera(x))
    {
        break;
    }
}

start to look like:

from x in mySequence
where x => SomeCriteria(x)
select x

If you are iterating with while because the thing you're working on isn't an IEnumerable<T>, you can always make it one:

    public static IEnumerable<T> EnumerateList<T>(this T t, Func<T, T> next)
    {
        while (t != null)
        {
            yield return t;
            t = next(t);
        }
    }

Answer by DJClayworth (Oct 21, 2008)

The rule makes sense only in theory. In theory for loops are for when you know how many iterations there are, and while loops are for everything else. But in practice when you are accessing something for which sequentil integers are the natural key, a for loop is more useful. Then if you want to terminate the loop before the final iteration (because you've found what you are looking for) then a break is needed.

Obey your teacher's restriction while you are writing assignments for him. Then don't worry about it.


Answer by tvanfosson (Oct 19, 2008)

I understand the issue. In general you want to have the loop condition define the exit conditions and have loops only have a single exit point. If you need proof of correctness for your code these are invaluable. In general, you really should try to find a way to keep to these rules. If you can do it in an elegant way, then your code is probably better off. However, when your code starts to look like spaghetti and all the gymnastics of trying to maintain a single exit point get in the way of readability, then opt for the "wrong" way of doing it.

I have some sympathy for your instructor. Most likely he just wants to teach you good practices without confusing the issue with the conditions under which those practices can be safely ignored. I hope that the sorts of problems he's giving you easily fit into the paradigm he wants you to use and thus failing you for not using them makes sense. If not, then you get some experience dealing with jerks and that, too, is a valuable thing to learn.


Answer by Thomas Padron-McCarthy (Mar 17, 2009)

Another view: I've been teaching programming since 1986, when I was teaching assistant for the first time in a Pascal course, and I've taught C and C-like languages since, I think, 1991. And you would probably not believe some of the abuses of break that I have seen. So I perfectly understand why the original poster's university outlaws it. It is also a good thing to teach students that just because you can do something in a language, that doesn't mean that you should. This comes as a surprise to many students. Also, that there is such a thing as coding standards, and that they may be helpful -- or not.

That aside, I agree with many other posters that even if break can make code worse, it can also make it better, and, like any other rule, the no-breaks rule can and (sometimes) should be broken, but only if you know what you're doing.


Answer by Pulsehead

You are still in school. Time to learn the most important mantra that colleges require of you:
Cooperate and Graduate.

It's good that your school has a guideline, as any company you work for (worth a plugged nickel) will also have a coding guideline for whatever language you will be coding in. Follow your guideline.


Answer by Samuel (Apr 17, 2010)

I don't think there is anything wrong with using breaks. I could see how using a break can be seen as skipping over code but it's not like a goto statement where you could end up anywhere. break has better logic, "just skip to the end of the current block".

slightly off topic...(couldn't resist)
http://xkcd.com/292/


Answer by Cervo

@lodle.myopenid.com

In your answer the examples do not match. Your logic is as follows in the example in the equation and example A in your answer:

while X != 0 loop
   set y
   if y == 0 exit loop
   set z
   if z == 0 exit loop
   do a large amount of work
   if some_other_condition exit loop
   do even more work
   x = x -1

example b:
while X != 0 loop
  set y
  if y == 0
    set z
  elseif z == 0
    do a large amount of work
  elseif (some_other_condition)
    do even more work
  x--

This is absolutely not the same. And this is exactly why you need to think about using break.

First of all in your second example you probably meant for the if var == 0 to be if var != 0, that is probably a typo.

  1. In the first example if y or z is 0 or the other condition is met you will exit the loop. In the second example you will continue the loop and decrement x = x - 1. This is different.
  2. You used if and else if. In the first example you set y, then check y, then set z then check z, then you check the other condition. In the second example you set y and then check y. Assuming you changed the check to y != 0 then if y is not 0 you will set z. However you use else if. You will only check Z != 0 (assuming you changed it) if y == 0. This is not the same. The same argument holds to other stuff.

So basically given your two examples the important thing to realize is that Example A is completely different from Example B. In trying to eliminate the break you completely botched up the code. I'm not trying to insult you or say you are stupid. I'm trying to overemphasize that the two examples don't match and the code is wrong. And below I give you the example of the equivalent code. To me the breaks are much easier to understand.

The equivalent of example A is the following

  done = 0;
  while X != 0 && !done {
    set y
    if y != 0 {
      set z
      if z != 0 {
        do large amount of work
        if NOT (some_other_condition {
          do even more work
          x = x - 1
        } else
          done = 1;
      } else
        done = 1;
    } else
      done = 1;
  }

As you can see what I wrote is completely different from what you wrote. I'm pretty sure mine is right but there may be a typo. This is the problem with eliminating breaks. A lot of people will do it quickly like you did and generate your "equivalent code" which is completely different. That's why frankly I'm surprised a software engineering class taught that. I would recommend that both you and your professor read "Code Complete" by Steve McConnell. See http://cc2e.com/ for various links. It's a tough read because it is so long. And even after reading it twice I still don't know everything in it. But it helps you to appreciate many software implementation issues.


Answer by Dustin Getz (Oct 21, 2008)

break typically does make loops less readable. once you introduce breaks, you can no longer treat the loop as a black box.

while (condition)
{
   asdf
   if (something) break;
   adsf
}

cannot be factored to:

while (condition) DoSomething();

From Code Complete:

A loop with many breaks may indicate unclear thinking about the structure of the loop or its role in the surrounding code. Excessive breaks raises often indicates that the loop could be more clearly expressed as a series of loops. [1]

Use of break eliminates the possibility of treating a loop as a black box1. Control a loop's exit condition with one statement to simplify your loops. 'break' forces the person reading your code to look inside to understand the loop's control, making the loop more difficult to understand. [1]

  1. McConnell, Steve. Code Complete, Second Edition. Microsoft Press Β© 2004. Chapter 16.2: Controlling the Loop.

Answer by Robert Rossney (Oct 21, 2008)

Out of curiosity, I took a little tour of the codebase I'm working on - about 100,000 lines of code - to see how I'm actually using this idiom.

To my surprise, every single usage was some version of this:

foreach (SomeClass x in someList)
{
   if (SomeTest(x))
   {
      found = x;
      break;
   }
}

Today, I'd write that:

SomeClass found = someList.Where(x => SomeText(x)).FirstOrDefault();

which, through the miracle of LINQ deferred execution, is the same thing.

In Python, it would be:

try:
   found = (x for x in someList if SomeTest(x)).next()
except StopIteration:
   found = None

(It seems like there should be a way to do that without catching an exception, but I can't find a Python equivalent of FirstOrDefault.)

But if you're not using a language that supports this kind of mechanism, then of course it's OK to use the break statement. How else are you going to find the first item in a collection that passes a test? Like this?

SomeClass x = null;
for (i = 0; i < SomeList.Length && x == null; i++)
{
   if (SomeTest(SomeList[i]))
   {
      x = SomeList[i];
   }
}

I think break is just a wee bit less crazy.


Answer by Personman

In general, anything that makes execution jump around and isn't a function call has the potential to make your code more confusing and harder to maintain. This principle first gained widespread acceptance with the publication of Dijkstra's Go To Statement Considered Harmful article in 1968.

break is a more controversial case, since there are many common use cases and is often pretty clear what it does. However, if you're reading through a three- or four-deep nested loop and you stumble upon a break (or a continue), it can be almost as bad. Still, I use it sometimes, as do many others, and it's a bit of a personal issue. See also this previous StackOverflow question: Continue Considered Harmful?


Answer by Brian Gianforcaro (Apr 17, 2010)

I believe your professor's are just trying (wisely) to instil go coding practices in you. Break's, goto's, exit(), etc can often be the cause behind extraneous bugs throughout code from people new to programming who don't really have a true understanding of what's going on.

It's good practice just for readability to avoid intruding possible extra entrances and exit's in a loop/code path. So the person who reads your code won't be surprised when they didn't see the break statement and the code doesn't take the path they thought it would.


Answer by Pavel Radzivilovsky

In the real world, few people care about style. However, break from loop is an okay thing by strictest coding guidelines, such as that of Google, Linux kernel and CppCMS.

The idea of discouraging break comes from a famous book, Structured Programming by Dijkstra http://en.wikipedia.org/wiki/Structured_programming that was the first one to discourage goto. It suggested an alternative to goto, and suggested principles which might have misled your professors.

Since then, a lot changed. Nobody seriously believes in one point of return, but, the goto - a popular tool at the time of the book - was defeated.


Answer by S.Lott

In the real-world, I look at every break statement critically as a potential bug. Not an actual bug, but a potential bug. I challenge the programmers I work with on every break statement to justify its use. Is it more clear? Does it have the expected results?

Every statement (especially every composite statement) has a post-condition. If you can't articulate this post-condition, you can't really say much about the program.

Example 1 -- easy to articulate.

while not X:
   blah blah blah
assert X

Pretty easy to check that this loop does that you expected.

Example 2 -- harder to articulate.

while not X:
   blah
   if something I forgot: 
      break
   blah blah
   if something else that depends on the previous things:
      break
   blah
assert -- what --?
# What's true at this point?  X?  Something?  Something else?
# What was done?  blah?  blahblah?

Not so easy to say what the post-condition is at the end of that loop. Hard to know if the next statements will do anything useful.

Sometimes (not always, just sometimes) break can be bad. Other times, you can make the case that you have loop which is simpler with a break. If so, I challenge programmers to provide a simple, two-part proof: (1) show the alternative and (2) provide some bit of reasoning that shows the post-conditions are precisely the same under all circumstances.

Some languages have features that are ill-advised. It's a long-standing issue with language design. C, for example, has a bunch of constructs that are syntactically correct, but meaningless. These are things that basically can't be used, even though they're legal.

break is on the hairy edge. Maybe good sometimes. Maybe a mistake other times. For educational purposesβ€”it makes sense to forbid it. In the real world, I challenge it as a potential quality issue.


Shared with attribution, where reasonably possible, per the SO attribution policy and cc-by-something. If you were the author of something I posted here, and want that portion removed, just let me know.

↧

Being aware of how sites reel you in... and hook you 🎣

$
0
0
Being aware of how sites reel you in... and hook you 🎣

I think most of us have a general sense of uneasiness with the firm grasp the most popular sites on the Internet have on us, but it's no mistake in design that they're popular... or that the cause of uneasiness is also the salve.

Post a few thoughts and feelings to Twitter, like a few tweets and follow someone in the hope they'll reciprocate, wonder why they didn't. Check Facebook to see what old friends are up to, marvel that you're nothing like them anymore, silently judge them while hoping not to be silently judged. Check Instagram, feel jealous of someone's carefully crafted photo, unaware that someone else is jealous of yours.

Scroll the feed/timeline/whatever to see if you missed anything, just once more, okay twice. Check the news for stories that confirm your world view, make you feel more "normal" in their outlandishness, or just set you on edge. Flip back to twitter for funny cat videos, see a political post, feel your blood pressure rise. Back to Facebook to scroll again. Rinse, repeat, rinse, repeat.

Or as Nir Eyal puts it, get an itch, scratch it, get another itch, scratch it again, over and over. And you're hooked, the ultimate goal of what he terms "behavioral design".

Being aware of how sites reel you in... and hook you 🎣
Photo by Daria Nepriakhina / Unsplash

Cut the line... at least for awhile

When I used Twitter and Facebook (I don't anymore), I didn't like how I worried about feedback. Twitter, especially, is heavily used by peers in my industry, and who doesn't want the respect of their peers? I feared missing an update. I loved and hated the ups and downs of positive and negative feedback, wondering if someone would validate what I shared, criticize it, or just tear into it. Now I only maintain a dummy account with no activity, to test my Twitter Tamer extension.

If you feel the same way, you're not alone. In fact, I'd challenge you to pick a month (as a New Year's resolution?), post a message telling everyone you're doing a social media detox (who isn't doing a detox of some sort anyway after a dozen holiday parties), disable notifications, and sign out. See how you feel. For me, it was uncomfortable for awhile, like I was missing out. It passed.


Understand the (al)lure

I'm on a kick now to understand why these sites are so alluring to me. Why do I use them, why do I miss them, is it by design (yes) or merely chance (no way).

I mentioned Nir Eyal's book above. I won't do it justice here, so when you take your month off, I recommend reading it. He lays out a solid method for getting users hooked on your app, and you'll find yourself, as I did, thinking about the various apps you use and how they do exactly what Nir describes, getting you to invest your own time in it, providing variable rewards like a slot machine, etc. It's enlightening and annoying to see how easy it is to be manipulated. We only have so many keystrokes (and mouse scrolls!) left, and someone's profiting off them.

Being aware of how sites reel you in... and hook you 🎣
replace "scientists" with "programmers"

Even better, he goes into the ethics of whether you should do it at all, about carefully thinking of what exactly your app achieves and whether or not it's a good solution to the itch it scratches. For example, people feel the loneliness itch, and Facebook or Twitter scratch it with an endless feed and variable rewards for contributing - is that the right cure for what ails us? πŸ€”


Endless feed(back)

Speaking of endless feeds, there's a great article by Rob Marvin titled "The Endless Scroll". Part of it's about using tech so much you forget to eat, sleep and work, but there's some really insightful stuff in there for anyone to understand about human nature in general. What follows are some of the more enlightening quotes. I'd be shocked if most of us didn't identify with at least some of these.

From Dr. David Greenfield, a psychologist studying and treating tech addiction:

"We feel constantly overwhelmed, because we're hypervigilant in responding to a million channels of information and communication, all of which emanate out of a device that we hold in our hands, that's with us 24/7. It's become an accessory to our life in a way that we've never seen before; it's a conduit through which we function and experience our lives. That has never existed in the history of humankind."

And a quote attributed to research Dr. Natasha Dow SchΓΌll is doing into addiction, describes what she calls "ludic loops" (yea, weird name), which is the comfort (or is it a mild high?) you feel when engaged in a repetitive activity that gives you occasional rewards.

Ludic loops occur when you pick up a smartphone and start scrolling. You flick through Facebook or Twitter, read some posts, check your email or Slack, watch a few Instagram stories, send a Snap or two, reply to a text, and end up back on Twitter to see what you've missed.

Before you know it, 20 or 30 minutes has gone by; often longer. These experiences are designed to be as intuitive as possible; you can open and start using them without spending too much time figuring out how they work.

Another great quote, again from Dr Greenfield. Of course, I have no idea what's talking about... do you?

[W]e've become a "boredom-intolerant culture," using tech to fill every waking moment β€” sometimes at the expense of organic creativity or connecting with someone else in a room. When was the last time you took public transportation or sat in a waiting room without pulling out a smartphone?
Being aware of how sites reel you in... and hook you 🎣
Photo by Jens Johnsson / Unsplash

From Adam Alter, another psychologist studying these things. This is part of the reason the next book on my list is The Art of Screen Time.

"I think it's really important that kids are exposed to social situations in the real world, rather than just through a screen where there's this delayed feedback. It's about seeing your friend when you talk to them; seeing the reactions on their face," said Alter. "The concern is that putting people in front of screens during the years where they really need to interact with real people may never fully acquire those social skills.

Ultimately though, it's up to us to cut ties with certain tech, instead of taking to social media (oh, the irony) when a company creates something we crave. Fortunately, the process of discovering how to write an addictive app means we're simultaneously discovering how to protect ourselves from addictive apps, thanks to the work of people like Nir Eyal and others like him. Once we realize we're being duped, we tend to hit back hard.


Tools don't use themselves

One last thought, from Arianna Huffington (who wrote a popular focus app).

Technology is just a toolβ€”it's not inherently good or bad. It's about how we use it and what it does for our lives. So phones can be used to enhance our lives or consume them. And though it sounds paradoxical, there's actually more and more technology that helps us unplug from technology. That kind of human-centered technology is one of the next tech frontiers.

Her argument is the same for any tool - a hammer can be misused to hurt someone, or it can be used to build a home for someone without one. It's always been about humans helping or hurting one another... not the tool itself.

Being aware of how sites reel you in... and hook you 🎣
Photo by Hunter Haley / Unsplash

If you're looking for a cliffs-notes version, someone put together a nice summary, which I think I'll hang on to for reference:

A summary of the book "Hooked: How to build habit-forming products"

↧
↧

3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

$
0
0
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

I started looking at Ansible last week, after finding some good intro articles by Erika Heidi. Here's the one I followed in the last post.

How to Use Ansible to Automate Initial Server Setup on Ubuntu | DigitalOcean
Ansible offers a simple architecture that doesn’t require special software to be installed on nodes. It also provides a robust set of features and built-in modules which facilitate writing automation scripts. This guide explains how to use Ansible to automate the steps contained in our Initial Serve…
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

If you followed how I set things up in my other post, then after the script creates the "sammy" user you still won't be able to login because you don't know the password. Just login to the remote host as "root", run sudo passwd sammy, and you're golden. Obviously, it'd be better if I automated that part too, but whatever.. this is for play.

Today I'm running through another of Erika's posts, which includes some sample playbooks to run. Plus I created a few of my own pointless playbooks.

Configuration Management 101: Writing Ansible Playbooks | DigitalOcean
This tutorial will walk you through the process of creating an automated server provisioning using Ansible, a configuration management tool that provides a complete automation framework and orchestration capabilities. We will focus on the language terminology, syntax and features necessary for creat…
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Create a file and change the modification date

I created my first playbook with only two tasks. It creates an empty file using the file module, then makes it look old by changing the modification stamp.

---
- hosts: all

  tasks:
    - name: Create an empty file because reasons
      file:
        path: ~/sample_file.txt
        state: touch

    - name: Change the modification time of the empty file
      file:
        path: ~/sample_file.txt
        modification_time: 199902042120.30

Run it with the -u flag to make it run as "sammy", and then verify that the file has been on your server for 20 years. :p

ansible-playbook my_first_playbook/playbook.yml -u sammy
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Install and remove packages

This task was taken from Erika's article, installing or updating 3 packages to the latest version. Then I added a task to remove git by setting it's state to absent. I can't believe how nicely Ansible abstracts away the underlying scripts it must be running to do what it does. πŸ‘

---
- hosts: all

  tasks:
    - name: Update some packages
      apt: name={{ item }} state=latest
      with_items:
        - vim
        - git
        - curl

    - name: Remove a package
      become: yes
      apt: name=git state=absent
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Spin up a website using Apache

You'll want to copy the contents of Erika's ansible folder in the following repo.

erikaheidi/cfmgmt
Configuration Management Guide. Contribute to erikaheidi/cfmgmt development by creating an account on GitHub.
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

If you followed my setup using 2 DigitalOcean droplets, you'll need to add a task to allow port 80 (see below).

---
- hosts: all
  become: true
  vars:
    doc_root: /var/www/example
  tasks:
    - name: Update apt
      apt: update_cache=yes

    - name: Install Apache
      apt: name=apache2 state=latest

    - name: Create custom document root
      file: path={{ doc_root }} state=directory owner=www-data group=www-data

    - name: Set up HTML file
      copy: src=index.html dest={{ doc_root }}/index.html owner=www-data group=www-data mode=0644

    - name: Allow all access to tcp port 80
      ufw:
        rule: allow
        port: '80'
        proto: tcp

    - name: Set up Apache virtual host file
      template: src=vhost.tpl dest=/etc/apache2/sites-available/000-default.conf
      notify: restart apache
  handlers:
    - name: restart apache
      service: name=apache2 state=restarted

Here's the results. I colorized each area of output to make it easier to understand.

  • The red area shows that the only open port was for SSH, but the Ansible script configured it to allow port 80 as well (purple area).
  • The blue and green areas show the web page and apache config file, respectively.
  • The yellow area shows that apache2 has been up for nearly 7 minutes. It didn't restart Apache when I ran the script below, because I had run the playbook several times already and the apache conf file hadn't changed, so the 'setup apache virtual host file' task didn't have to run again... at least that's how I understand it.
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

And finally, opening the little index.html page I created, which was copied to the remote host that the Ansible controller node copied it to. Success!

3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

The power of Ansible is easy to see. So far, I've only played around with pushing changes to a single remote host, but I could easily spin up more droplets, modify the /etc/ansible/hosts file on the controller node (pasted below) to include them, and push out a website (or anything else I want) to every machine at once. 🀯

# This is the default ansible 'hosts' file.
#
# It should live in /etc/ansible/hosts
#
#   - Comments begin with the '#' character
#   - Blank lines are ignored
#   - Groups of hosts are delimited by [header] elements
#   - You can enter hostnames or ip addresses
#   - A hostname/ip can be a member of multiple groups

[servers]
server1 ansible_host=64.225.30.45

[servers:vars]
ansible_python_interpreter=/usr/bin/python3
↧

SO Vault: Overwhelmed by Machine Learning - is there an ML101 book?

$
0
0
SO Vault: Overwhelmed by Machine Learning - is there an ML101 book?

StackOverflow sees quite a few threads deleted, usually for good reasons. Among the stinkers, though, lies the occasionally useful or otherwise interesting one, deleted by some pedantic nitpicker - so I resurrect them. πŸ‘»

Note: Because these threads are older, info may be outdated and links may be dead. Feel free to contact me, but I may not update them... this is an archive after all.


Overwhelmed by Machine Learning - is there an ML101 book?

Question asked by StackUnderflow on Feb 28, 2009

It seems like there are so many subfields linked to Machine Learning. Is there a book or a blog that gives an overview of those different fields and what each of them do, maybe how to get started, and what background knowledge is required?

Comments

+1 good question. I would be interested in this as well – Erik Ahlswede Feb 28 '09 at 21:38

It's laughable how many good, useful questions are closed on SO. This question has 155 upvotes and 234 stars at the time of this writing, and the accepted answer has 153 upvotes. – weberc2 Oct 9 '14 at 20:25

If your not into math and are into programming, I suggest you look at this: karpathy.github.io/neuralnets – Karl Morrison Apr 1 '15 at 4:34


Answer by Jeff Moser (Feb 28, 2009)

Here's the best description I've ever heard of Machine Learning:

Machine learning is actually a software method. It's a way to generate software. So, it uses statistics but it's fundamentally... it's almost like a compiler. You use data to produce programs. - John Platt, Distinguished Scientist at Microsoft Research in his Future of AI series talk (2:17:53)

Some even argue that "everything that algorithms was to computer science 15 years ago, machine learning is today."

For more details, I'd recommend starting out with a fun intro to what's possible such as Peter Norvig's Theorizing from Data talk, a peek at what DeepMind is doing, or more recently the Future of AI series of talks (that I quoted from above).

Next get your hands dirty with Jeremy Howard's "Getting In Shape For The Sport of Data Science." It's a great pragmatic overview of actually working with data.

Once you've played around a bit, watch Ben Hamner's "Machine Learning Gremlins" for a nice pragmatic disclaimer of what can easily go wrong when doing machine learning.

I wrote a blog post "Computing Your Skill" after spending months trying to understand TrueSkill, the ML system that does matchmaking and ranking on Xbox Live. The post goes into some foundational statistics needed for further study in machine learning.

Perhaps the best way to learn is to just try it. One approach is to try a Kaggle competition that sounds interesting to you. Even though I don't do great on the leaderboards there, I always learn things when I try a competition.

After that you've done the above, I'd then recommend something more formal like Andrew Ng's online class. It's at the college level, but approachable. If you've done all the above steps, you'll be more motivated to not give up when you hit some harder things.

As you continue, you'll learn about things such as R and its many packages, SciPy, Cross Validation, Bayesian thinking, Deep Learning, and much much more.

DISCLAIMER: I work at Kaggle and several of the above links mention Kaggle, but I believe they're a fantastic place to start.


Answer by Imran (Mar 01, 2009)

videolectures.net has a large collection of Machine Learning videos . One very good technical introductory lecture on the site is Machine Learning, Probability and Graphical Models by Sam Roweis.

A good overview of the field is Tom Mitchell's seminar The Discipline and Future of Machine Learning. Here is a direct link to the video [mov]. And the Syllabus page has a good list of recommended texts:


Answer by dmcer (Mar 19, 2010)

Ethem Alpaydin's Introduction to Machine Learning is a pretty accessible overview of the field.

If you're feeling overwhelmed by the other options you might want to start with it first.


Answer by Mr Fooz (Feb 28, 2009)

Two of the best textbooks out there are:

Another good resource is MIT's Open CourseWare site for their Machine Learning class.


Answer by Tirrell Payton (Feb 15, 2012)

I found "Programming Collective Intelligence" to be the book that really helped me (with practical examples) and an "Algorithm Beastiary" at the end.


Answer by Volatil3 (Jul 06, 2012)

Dr Yaser Abu Mustafa's Intro course is also in detailed and he explained it quite well

http://work.caltech.edu/telecourse.html


Answer by Matias Rasmussen (Sep 28, 2012)

I really like the Machine Learning course on Coursera. I find the short lectures very easy to digest.


Answer by theycallmemorty (Apr 01, 2009)

Artificial Intelligence: A Modern Approach is the most common text book for introductory AI courses.

Witten and Frank's book on Data Mining is a little easier to digest if that topic is what appeals to you.


Answer by Pete (Feb 28, 2009)

You are right to feel that there are lots of sub-fields to ML.

Machine Learning in general is basically just the idea of Algorithms which improve over time. If you're simply curious, some random topics that come to mind include:

Classification, Association analysis, Clustering, Decision Trees, Genetic Algorithms, Concept Learning

As far as books go:

I'm currently using Introduction to Data Mining for a course right now. It covers quite a few of the topics I've listed above and usually has examples of algorithms/uses in each section.

You don't need too much background knowledge to understand a lot of the topics. Most algorithms have some math underlying them which is used to improve the results, and you obviously need to be comfortable with general programming/data structures.


Answer by Genjuro (Dec 05, 2011)

i'd recommand you take a look at ml-class.org.


Answer by lmsasu (Feb 12, 2012)

Try A First Encounter with Machine Learning, it's a freely available course for undergraduate level.


Answer by vikram360 (Sep 12, 2011)

I've been using 'Machine Learning: An algorithmic Perspective' by Stephen Marsland. And I think the approach is awesome. The author has put up the python code on his site. So you can actually download the code and look at it just to take a peek at how things work.

http://www-ist.massey.ac.nz/smarsland/MLbook.html


Answer by unj2 (Jul 29, 2009)

The Machine Learning subreddit has interesting links for all levels.


Shared with attribution, where reasonably possible, per the SO attribution policy and cc-by-something. If you were the author of something I posted here, and want that portion removed, just let me know.

↧

Why are websites requesting access to motion sensors... on my desktop?

$
0
0
Why are websites requesting access to motion sensors... on my desktop?

I was checking the status of a FedEx order in Brave, when I noticed a notification in the address bar that I've never seen before. It was warning me that "this site has been blocked from accessing your motion sensors". Wut? It doesn't even need to be an order status - their home page kicks it up too.

I'm struggling to understand why a website would need access to a motion sensor on a mobile device, let alone the fact I was using a desktop. Do I get a different experience if I knock my PC off the desk? Tip my monitor on its side? Grab the mouse cord and spin it around my head really fast?

Why are websites requesting access to motion sensors... on my desktop?

After a few cursory online searches, I'm coming up with little other than a few threads on Reddit and Brave that indicate people are also seeing this on Kayo Sports and Twitch, as well as Experian and Tutanota.

Guess it's time to dig a little deeper.


What are Web APIs?

Before zeroing in on sensors, let's backup a sec and talk about web design and Web APIs. Your browser has access to a lot of data via (and metadata regarding) the device you installed it on. As much as some of the websites you visit would looove to have access to all that data, any decent browser acts as a firewall, blocking that access by default and prompting you to allow it.

Geolocation API

One of the more common APIs is the one used to request your location, usually when you're using a websites's "store locator" to find the store nearest you.

The button below uses code (lightly modified) from MDN's Geolocation API docs. When you click it, the JavaScript code executes a call to navigator.geolocation.getCurrentPosition(), asking the browser for your location.

Β  Β 

Your browser prompts you to allow access, which you can deny. Yay privacy.

If you don't see the prompt but you think you've allowed it, there are two different settings that control access - a global page with a list of "blocked" and "allowed" sites, and a per-site page where you can adjust all permissions for a single site. In Chrome, just replace brave:// with chrome:// in the address bar.

Notifications API

Another (unfortunately, very) popular API is the one used to display notifications to visitors. Using the Notifications API, you can request permission from a visitor with a call to Notification.requestPermission() and then just create a new Notification() to annoy them keep them up to date. (May not work in Brave due to a bug.)

Sensors API

There's a (maybe sorta?) new API for requesting access to sensors in Chromium-based browsers (Ghacks puts it at Chrome 75, around June 2019, but wikipedia suggests Chrome 67 around May 2018). It's not widely supported yet. According to MDN, the only major browsers that currently support it are Chrome and Opera, on desktop and mobile.

Check out the MDN docs, the W3C candidate recommendation, the ongoing conversation over at Chrome, and Intel's Sensor API playground for examples.

The following links execute some JavaScript code to try starting up various sensors, which should trigger the sensor icon in the address bar. (If an error occurs, it'll display below the links.)

Status Message: N/A

As with the geolocation and notification APIs, you can grant or deny access at the global or per-site level. What's kind of annoying is that all of the above sensors fall under a single "motion sensors" umbrella, so you can't easily tell which of those sensors a particular site is trying to access.


Why are certain sites requesting the Sensors API?

That's the hundred-dollar question. I see it on FedEx and Kayo Sports (every time) and Twitch (sometimes). I'm sure there's other sites too, but the question is, why do sites as varied as these want access to a gyroscope or accelerometer?

Why are websites requesting access to motion sensors... on my desktop?

I haven't confirmed anything, but if I had to guess, I'd say they're all using the same library, and it got changed. Like all modern development, websites are built upon layers and layers of libraries that depend on other libraries. Somewhere down the line, I wonder if one is requesting access to an API that it doesn't need? After poking around a bit, I didn't see anything obvious, but then some of the scripts were obfuscated so there's little chance of figuring those out.

Your guess is as good as mine, but like all the Web APIs, if you don't believe a site needs the data it's requesting access to, tell your browser to block it!

↧

How to deploy your own private RequestBin instance in under 5 minutes

$
0
0
How to deploy your own private RequestBin instance in under 5 minutes

If you've ever needed to consume a webhook from another service, say from Stripe or GitHub, but you weren't completely sure what the payload was going to look like, a tool like RequestBin can help. By setting it as the "target" for the webhook, it intercepts and displays whatever's sent its way.

Same goes if you're developing a REST API and want to make sure that your POST and PUT actions are sending what you expect. You could develop a separate app that consumes your API the way your customers will and displays the results, but why bother with the overhead?

The same team that designed RequestBin (which seems to be abandoned, but more on that below) used to host a public instance of it for anyone to use too, but such services don't seem to last, and theirs didn't either once the VC money dried up. It's got to be expensive hosting something like that for thousands (tens of thousands? hundreds?) of users for free. πŸ’Έ


Deploy with DigitalOcean in <5 minutes

Fortunately, the makers of RequestBin also made it really easy to deploy on your own. Just create a DigitalOcean droplet with Docker preinstalled; unless you know you're going to need more resources, the basic $5/mo plan is sufficient. It should only take a minute or so to spin up.

How to deploy your own private RequestBin instance in under 5 minutes

Connect to your new VM, most likely with ssh root@<your-droplet-ip-address>, and then run the commands in the readme. The build command takes a few minutes on its own, but the up command should only take a few seconds.

git clone git://github.com/Runscope/requestbin.git
cd requestbin
sudo docker-compose build
sudo docker-compose up -d

Assuming no errors in the output, just paste <your-droplet-ip-address>:8000 into your favorite browser, and away you go!

How to deploy your own private RequestBin instance in under 5 minutes

Create your first RequestBin and POST some data with a simple curl command like they suggest. Update the page and you should see your data listed.

You can also use a tool like Postman to make requests to the endpoint, and even save them for future use - something I've made extensive use of while learning and writing about various APIs.

How to deploy your own private RequestBin instance in under 5 minutes

Changing Built-in Settings (i.e. max TTL, max requests, and port)

There's some settings, like a max of 20 requests, that make sense if you've got an environment that thousands of people will be using. But since it's just you, and maybe a small team, I'd say you could safely increase those a bit.

If the container is up and running, bring it down now and verify it's gone.

root@docker-s-1vcpu-1gb-nyc3-01:~# docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
0f9ecfdde471        requestbin_app      "/bin/sh -c 'gunicor…"   25 minutes ago      Up 25 minutes       0.0.0.0:8000->8000/tcp   requestbin_app_1
99415b11ab7c        redis               "docker-entrypoint.s…"   25 minutes ago      Up 25 minutes       6379/tcp                 requestbin_redis_1

root@docker-s-1vcpu-1gb-nyc3-01:~# cd ~/requestbin/

root@docker-s-1vcpu-1gb-nyc3-01:~/requestbin# sudo docker-compose down
Stopping requestbin_app_1   ... done
Stopping requestbin_redis_1 ... done
Removing requestbin_app_1   ... done
Removing requestbin_redis_1 ... done

root@docker-s-1vcpu-1gb-nyc3-01:~/requestbin# docker container ls
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Open the requestbin/config.py file and change some of these values.

  • The BIN_TTL is the time to live in seconds, so if you want your requests to live for a year, then set BIN_TTL = 365*24*3600
  • There's no reason to only hold on to 20 requests; if you like, you could set MAX_REQUESTS = 2000 or some other value. If you set it to a million and everything crashes... not my fault.

While you're at it, you could make it so you don't have to enter a port either, since presumably you're not running anything else on this tiny server.

  • Edit docker-compose.yml and change the "ports" section to "80:8000"
  • Edit Dockerfile to EXPOSE 80
  • Remove the current requestbin_app image with docker image rm
  • Run sudo docker-compose up -d again and verify your changes took effect

Some of the values are also hard-coded into the HTML page, so even after doing all the above, the page will probably still tell you you're limited to 20 requests. It lies. If you run the CURL command 30 times now, you'll see 30 requests on the page.


Other Considerations

So, hopefully you haven't been passing anything too sensitive to your RequestBin instance yet, because right now it's all plain-text. If you need to pass secure data, consider setting up SSL. That's not something I'm delving into here - not yet, anyway.

What I am doing is copying the original project which, as I mentioned, seems to be abandoned. They shutdown the public RequestBin site (understandably), but also haven't merged in PRs or addressed issues for nearly two years.

grantwinney/requestbin
Inspect HTTP requests. Debug webhooks. Originally created by @progrium. Since the original project appears abandoned, this is for merging PRs and addressing issues. - grantwinney/requestbin
How to deploy your own private RequestBin instance in under 5 minutes

I was going to fork it, which is the usual way to make updates that might someday be merged back in, but it seems that GitHub warns repo owners (not forks) of security vulnerabilities... and even opens issues on your behalf. Nice! My intention is to merge the pending PRs and try addressing some of the issues myself, but we'll see how that goes.

↧
↧

Sharing is Caring

$
0
0
Sharing is Caring

I've heard it said that a business's purpose, reduced to a single point, is to separate you from your money. A little cynical, sure... but not wrong. A company might appeal to your compassion, outrage, or sensibilities, prey on your doubts and fears, win you over with clever marketing, or just scratch an itch you didn't even know you had. Some do a nice job of reinvesting in their employees and community. But at the end of the day, it's (ultimately) about keeping the lights on.

It's hard enough to see past our own feelings, to judge a product on its own objective merits, but at least when we buy into something, we can usually just as easily stop buying into it. Sure, there's contracts and whatnot, but when you don't renew, the problem is solved.

But these days, the hot commodity isn't the green stuff in your wallet. The modern business's purpose, reduced to a single point, seems to be to separate you from your data. And as it happens, I stumbled across several instances of data mining wrapped up as "features" just last week.


LinkedIn cares about your team

After I accepted a connection from a legit coworker, LinkedIn asked if they were a current or past coworker. Odd.. surely they could determine that from our mutual "experience" sections.

Ah, they've got a "teammates" feature, and they want to start the ball rolling on my filling in details about working relationships. And while I'm at it, I could pony up details about my other connections too. This is all for my convenience of course - never miss an important update!

Perceived Benefit: By telling LinkedIn who your current and past manager, direct reports, and team members were/are, you'll be fed more relevant updates and stories in your timeline. If you do (or did) work for a corporation with thousands of employees, then having the same company listed on both your profiles doesn't necessarily mean you want to know everything about that person, so you can feed their algorithm and get the more relevant (to you) stuff.

Data Opportunity: LinkedIn already knows who works (or did work) for the same company, but this gives them access to something they don't have - your corporate structure. If enough people feed the machine info about who they work with and in what capacity, it's easily possible for LinkedIn to piece together a company's internal structure.

Potential Risk: Many companies publicize their top leadership, but not many list everyone. And few to none would make their complete org chart public, but that's exactly what LinkedIn would have. Is that considered sensitive info? I'm not sure. It seems like the kind of thing most companies would like to remain confidential.


Facebook cares about your health

Have you heard about Facebook's new Preventive Health portal? Hand over your medical information to FB, and they'll make recommendations on what kinds of other preventive measures you should take.

Perceived Benefit: Facebook will show you recommendations for preventive health, based on your age and gender. If you let them know what medical checkups you've already completed over the years, they'll make it more personalized.

Data Opportunity: Facebook gets access to your medical data. I can't imagine what kind of new and amazing ads they can start targeting you with once they realize the kinds of medical treatments you're having done, or are scheduled to do. They say, "we're starting with health checkups related to heart disease, cancer and flu", so if this is remotely successful, it'll almost certainly extend to all kinds of medical issues.

Not to mention, "at this time, Preventive Health is only available on the Facebook mobile app", for reasons I can't even guess. All the mobile app requests access to is your contacts, calendar, phone log, app history, microphone, camera, location, media, sms, so... seems legit.

Potential Risk: In the US, "The HIPAA Privacy regulations require health care providers and organizations, as well as their business associates, to develop and follow procedures that ensure the confidentiality and security of protected health information (PHI) when it is transferred, received, handled, or shared."

But in Facebook's own words:

  • Facebook doesn't endorse any particular health care provider.
  • Locations don’t pay Facebook to be included on maps in Preventive Health.
  • Neither Facebook, nor any of its products, services or activities are endorsed by CDC or the U.S. Government.

Facebook isn't a health care provider, nor a health care provider's business associate, so no HIPAA. Besides, given their abysmal record on data breaches, I'd be wary of any promises they made anyway.


Amazon cares about your privacy

And someone on Twitter posted this warning they got when visiting Amazon.com with the Honey browser extension installed. It's an extension that monitors what you're shopping for, and lets you know if it's cheaper somewhere else or there's coupons. I installed Honey to try replicating it, but couldn't.

Sharing is Caring

Perceived Benefit: Amazon is issuing a public service announcement, trying to protect you (their loyal customer) from the harms of a rogue browser extension.

Data Opportunity: Amazon has their own browser extension, which has far fewer ratings and an lower overall rating, and which (according to the permissions it requests) can access the sites you visit, access your bookmarks and location, manage your other extensions, observe and analyze network traffic and intercept, block, or modify that traffic, similar too (but more extensive than) the Honey extension.

Potential Risk: If all you do is uninstall Honey, there's not really any risk. If you replace it with Amazon's extension, I'd say you're giving up even more data, feeding the giant machine. They promise to help you "price compare across the web", but that's a difficult pill to swallow. Besides, it's a little tough to buy the "security risk" bit from a company that wants to install listening devices into your home, sometimes with unexpected, shady results.

As a side note, I don't even know how Amazon managed to detect that Honey was installed. I can think of two possibilities. First, maybe the twitter poster had the Amazon extension installed too. With the "management" permission, it might be able to detect Honey by itself. I tried it, but I didn't get a warning.

The other possibility is that they're running a bit of javascript code client-side, sometime after the page loads, that detects Honey. Sites do stuff like that with ad blockers frequently, and it's fairly trivial.

For example, I uploaded a small file called "ads.js" to my site (view it here), which is basically guaranteed to be blocked by ad blockers. Then below, on this post only, I attempt to load the script and run a second script that detects whether the variable in that file was created. If it wasn't, then your ad blocker blocked the "ads.js" script. Try disabling your ad blocker and refresh the page, and the message below should change.

No Adblocker!

Since Honey probably injects something into the page to help users get the best deal, Amazon could inject some code into their page that inspects the DOM for some element that they know Honey creates, and then display a warning at the top.

Conflict of interest, anyone?


There's so much caring going on, I'm getting weepy

I could go on and on with other examples, but the moral of the story is that when a company rolls out a new "feature" that exchanges your personal data - especially something they wouldn't otherwise have access to - in exchange for a little convenience, take a second or two to think about what they stand to profit from it.

Once that data is in their hands, you can't do much about it, and they may be able to profit from it any way they like for a long time.

↧

The Start of the Web

$
0
0
The Start of the Web

Ever gone looking for the end of the Internet? It's not hard to find.. turns out there's dozens of ends. Which makes sense really, since the web is by definition nonlinear.. it's more of a mish-meshy, hydra sorta thing that's forever expanding. To make your very own "end" though, just create a page with absolutely no outgoing links. Ta-da.


Beginnings

A better trick is finding the beginning of the Internet. Well.. it's still there!

Over the years, servers moved around, systems were reformatted, files were lost... history became legend, legend became myth and for two and a half decades, the first page passed out of all knowledge until, when chance came in 2013, it was rehosted at its original domain. But probably not on Tim Berners-Lee's original NeXT machine. 😏

Check it out, in all its simplified simplicity, including a list of people at CERN who were developing the WorldWideWeb project, some help docs, a list of other servers connected to this early "web", etc. I can't believe they hung on to them.. I can't even find my first website, and I know I saved it somewhere...

It's.. quaint, how few servers there were in the beginning. Most of them seem to be lost forever, but the Internet Archive crawled some before they disappeared, such as:

There were also attempts, pre modern search engines obviously, to manually index the entirety of the web. One attempt was Netscape's Open Directory Project, whose goal was to "produce the most comprehensive directory of the web by relying on a vast army of volunteer editors".


Good Intentions

It strikes me, looking at that list of servers, that the beginning of the web was full of good intentions. Scientists, engineers, professors.. universities, science labs... the WorldWideWeb project was supposed to be bring together all kinds of centers of learning.

From a 1995 talk by Berners-Lee, called Hypertext and Our Collective Destiny:

I had (and still have) a dream that the web could be less of a television channel and more of an interactive sea of shared knowledge. I imagine it immersing us as a warm, friendly environment made of the things we and our friends have seen, heard, believe or have figured out.

I would like it to bring our friends and colleagues closer, in that by working on this knowledge together we can come to better understandings. If misunderstandings are the cause of many of the world's woes, then can we not work them out in cyberspace. And, having worked them out, we leave for those who follow a trail of our reasoning and assumptions for them to adopt, or correct.

Unfortunately, it's far outgrown that early vision, morphing into something that at times is vitriolic and ugly, used for exploitation or maliciousness. Don't blame the tool - it was as inevitable as the human spirit. How people use it reflects what's already in their hearts, and there's a lot of good on the web too.


Learn More

To learn more about the beginnings of the 'Net, start with the birth of the web. You can also run the WorldWideWeb browser from your browser - browserception!

For about a year after the first website was made available again in 2013, there was a pretty concerted effort to restore and maintain archives of a lot of software and hardware related to the same timeframe, which you can read about here.

Then check out the origins of the NIKHEF site and a personal account by Willem van Leeuwen, stories of the first webpages created at UNC, and even the original 1989 proposal for the web!

↧

What we've got here is a failure to communicate

$
0
0
What we've got here is a failure to communicate

Let's be honest, most of us don't expect good support from a big company.

If you go to a mom-and-pop pet shop, it's reasonable to assume they love animals and would offer great advice for your ailing pet. If you go to a corner book shop, the few employees working there likely love books and would enjoy helping with a rare find. When I visited a family-owned lumberyard to build a dining room table, they were passionate, and I got some great advice.

But if you go to Lowe's, you can expect to find supplies for 5 or even 50 home projects in one store, but little to no expertise. If you go to Walmart, you can expect to find nearly anything (of questionable quality, perhaps), but you don't expect passion from the garden center when it comes to the best plants for your yard, or why a certain rhododendron intended for shady areas isn't doing well.

And if you turn to the trillion dollar behemoth that is Google for one of their many random products, you can expect it to be secure (from everyone but them), easy to use, and mostly reliable. But if you try to help someone connect their Android phone to their Google account, and the sync process wipes out their local contacts instead of merging them (true story), you realize very quickly that Google doesn't post solutions or workarounds to the various forums where users report such things.

What we've got here is a failure to communicate

My latest foray into Google's abysmal support started last week, when I pushed an update for one of my browser extensions to their monopolistic web store. I wrote Twitter Tamer a couple years ago to hide elements of the UI and make it less distracting, and simply added another option when I got an unexpected email.

They don't include their name, so I'll just refer to them as Joshua.


Joshua: Shall we play a guessing game?

From: Chrome Web Store Developer Support
Date: 2/6/2020 12:45 PM
Subject: Chrome Web Store: Removal notification for Twitter Tamer

Dear Developer,

Your Google Chrome item "Twitter Tamer" with ID: aflapchiclhldkgbbahbdionenmhkoed did not comply with our policies and was removed from the Chrome Web Store.

Your item did not comply with the following section of our Program Policies:

"User Data Privacy"

Your product violates the "Use of Permissions" section of the policy, which requires that you:

Request access to the narrowest permissions necessary to implement your product’s features or services.

If more than one permission could be used to implement a feature, you must request those with the least access to data or functionality.

Don't attempt to "future proof" your product by requesting a permission that might benefit services or features that have not yet been implemented.

Once your item complies with Chrome Web Store policies, you may request re-publication in the Chrome Web Store Developer Dashboard. Your item will be reviewed for policy compliance prior to re-publication.

If you have any questions about this email, please respond and the Chrome Web Store Developer Support team will follow up with you.

Important Note:

Repeated or egregious policy violations in the Chrome Web Store may result in your developer account being suspended or could lead to a ban from using the Chrome Web Store platform.

This may also result in the suspension of related Google services associated with your Google account.

Sincerely,

Chrome Web Store Developer Support

------------------------------------------------------
Developer Terms of Service
Program Policies
Branding Guidelines

I checked online to verify, and sure enough:

What we've got here is a failure to communicate

Me: I'd prefer you just spell it out.

2/6/2020 1:14 PM

What?? I request access to "activeTab" and "storage" (for saving settings), and the "matches" section is only for Twitter's domain. If you see a way to limit that more and have the extension still function, by all means please share.

Something flagged my code for permissions, but Google only specializes in machine learning, not people learning, so they don't think it's important to share which permissions were flagged.


Joshua: No, a guessing game!

2/9/2020 6:34 AM

Dear Developer,

Upon review of your Product, [Twitter Tamer ], with ID: [aflapchiclhldkgbbahbdionenmhkoed], we find that it does not comply with the Chrome Web Store’s User Data Policy, and it has been removed from the store.

Your Product violates the β€œUse of Permissions” section of the policy, which requires that you:

Request access to the narrowest permissions necessary to implement your Product’s features or services. If more than one permission could be used to implement a feature, you must request those with the least access to data or functionality.

Don't attempt to "future proof" your Product by requesting a permission that might benefit services or features that have not yet been implemented.

To reinstate your Product, please ensure that your Product requests and uses only those permissions that are necessary to deliver the currently stated product’s features.

If you’d like to re-submit your Product, please modify your Product so that it complies with the Chrome Web Store’s Developer Program Policies, then re-publish it in your Developer Dashboard.

Please reply to this email for questions / clarifications regarding this Product removal.

Thank you for your cooperation,

Google Chrome Web Store team

---------------------------

Resubmission

If you resubmit your Product, it will not be immediately published live in the store. All re-submitted Products undergo a strict compliance review and will be re-published only if the Product passes that review.

Important Note

Repeated or egregious violations of the policies may result in your developer account being banned from the store. This may also result in the suspension of related Google services associated with your Google account. All re-submitted Products will continue to be subject to the Chrome Web Store Program Policies and Terms of Service.

Program Policies:
https://developers.google.com/chrome/web-store/program_policies

Developer Terms of Service:
https://developers.google.com/chrome/web-store/terms

User Data Policy Chromium Blog Post:
http://blog.chromium.org/2016/04/ensuring-transparency-and-choice-in.html

Branding Guidelines:
https://developers.google.com/chrome/web-store/branding

User Data Policy FAQ:
https://developer.chrome.com/webstore/user_dat

Chrome Web Store Developer Dashboard https://chrome.google.com/webstore/developer/dashboard

Me: Is one of these the issue?

2/9/2020 7:40 AM

It took 2 days for someone to send the same canned response?! Let's try again.

I request access to "activeTab" and "storage" (for saving settings), and the "matches" section is only for Twitter's domain. I need to know exactly what to change to be approved. What specific change do I need to make to the permissions?

Joshua: Sure, let's roll with that.

2/10/2020 7:12 AM

Dear Developer,

Upon review of your Product, Twitter Tamer, with ID: aflapchiclhldkgbbahbdionenmhkoed, we find that it does not comply with the Chrome Web Store’s User Data Policy, and it has been removed from the store.

Your Product violates the β€œUse of Permissions” section of the policy, which requires that you:

Remove activeTabs permission.

Request access to the narrowest permissions necessary to implement your Product’s features or services. If more than one permission could be used to implement a feature, you must request those with the least access to data or functionality.

Don't attempt to "future proof" your Product by requesting a permission that might benefit services or features that have not yet been implemented.

To reinstate your Product, please ensure that your Product requests and uses only those permissions that are necessary to deliver the currently stated product’s features.

If you’d like to re-submit your Product, please modify your Product so that it complies with the Chrome Web Store’s Developer Program Policies, then re-publish it in your Developer Dashboard.

Please reply to this email for questions / clarifications regarding this Product removal.

Thank you for your cooperation,

Google Chrome Web Store team

---------------------------

Resubmission

If you resubmit your Product, it will not be immediately published live in the store. All re-submitted Products undergo a strict compliance review and will be re-published only if the Product passes that review.

Important Note

Repeated or egregious violations of the policies may result in your developer account being banned from the store. This may also result in the suspension of related Google services associated with your Google account. All re-submitted Products will continue to be subject to the Chrome Web Store Program Policies and Terms of Service.

I made a mistake after this, and wasted my time removing that permission. Note that all "ActiveTab" pretty much does is give you access to the current page only when the user clicks your extension's icon in the toolbar. I used it to enable/disable the extension and offer to refresh the page for the user, only if they were actually on Twitter's site.

Whatever.. I removed it and the corresponding functionality, and moved some things around to make it easy to still enable/disable the extension, and advise the user to refresh manually to see the changes. When I tried to upload it, I got this message, so I made sure all the fields were filled in. Should be good now, right?

What we've got here is a failure to communicate

The next morning, I got another email, flagging the extension for a completely different reason. Idiots.


Joshua: Shall we play a game of whack-a-mole now?

2/11/2020 5:09 AM

Dear Developer,

Your Google Chrome item "Twitter Tamer" with ID: aflapchiclhldkgbbahbdionenmhkoed did not comply with our policies and was removed from the Chrome Web Store.

Your item did not comply with the following section of our Program Policies:

"Spam and Placement in the Store"

Item has a blank description field, or missing icons or screenshots, and appears to be suspicious.

Once your item complies with Chrome Web Store policies, you may request re-publication in the Chrome Web Store Developer Dashboard. Your item will be reviewed for policy compliance prior to re-publication.

If you have any questions about this email, please respond and the Chrome Web Store Developer Support team will follow up with you.

Important Note:

Repeated or egregious policy violations in the Chrome Web Store may result in your developer account being suspended or could lead to a ban from using the Chrome Web Store platform.

This may also result in the suspension of related Google services associated with your Google account.

Sincerely,

Chrome Web Store Developer Support

------------------------------------------------------
Developer Terms of Service
Program Policies
Branding Guidelines

Me: What kind of game are you really playing?

2/11/2020 7:47 AM

This extension has been around for quite awhile. I don't obfuscate the code or do anything else that's suspicious. It's certainly not spam. The whole thing is open source and available for anyone to pick apart.

I took the time to remove the activeTab permission, which you were complaining about before. Now you're rejecting it for a completely different reason. What in the world is really going on? Do you have an interest in Twitter, and approving this extension is now some conflict of interest for Google?

At this point, I'm flummoxed. It's got a description, screenshots, icons, blah blah blah, so the only thing left is it "appears to be suspicious". What do you do with that? How do you fix, "some Google employed automaton finds my extension suspicious"?

Okay, the code ain't the prettiest, and I'm sure someone with more JS experience could write something lovely and worthy of the Google overlords' approval, but I don't attempt to hide anything whatsoever. Over a thousand people find it useful, and it has an overall positive rating.

What we've got here is a failure to communicate

Joshua: The kind where I win, and you lose.

2/12/2020 3:07 AM

Dear Developer,

Your item, "Twitter Tamer," with ID: aflapchiclhldkgbbahbdionenmhkoed, did not comply with our Developer Program Policies and was removed from the Google Chrome Web store.

Your item did not comply with the following section of our policy:

We may remove your item if it has a blank description field, or missing icons or screenshots, and appears to be suspicious

If you'd like to re-submit your item, please make the appropriate changes to the item so that it complies with our policies, then re-publish it in your developer dashboard. Please reply to this email for issues regarding this item removal.

*Please keep in mind that your re-submitted item will not be immediately published live in the store. All re-submitted items undergo a strict compliance review and will be re-published if the item passes review.

*Important Note

Repeated or egregious violations in the store may result in your developer account being banned from the store. This may also result in the suspension of related Google services associated with your Google account.

All re-submitted items will continue to be subject to Chrome Web Store policies and terms of service.

Thank you for your cooperation,

Google Chrome Web Store team

Final Thoughts: The only winning move is not to play.

Okay, that's pretty dramatic. I won't dump Google completely, but I will start relegating them to the corner. They provide some really useful services, but given the support they offer, I feel I can't trust them with anything truly valuable - emails, contacts, photos, etc. Or my contributions to their ecosystem.

I created the extension to help others, it achieved that, and suddenly Google pulled the rug out. Maybe if I had a social media presence to call them out then I'd get more info... or maybe not. I don't use Twitter anyway, so it's just not worth the effort for me. I'll post some workarounds for Twitter Tamer, and I'll also be writing a series of posts on how to avoid Google's stranglehold on your own tech life.

One final thought. Having a few "official" stores, each tightly integrated with their own platform, seems like a bad idea. The perceived value is that they're keeping their respective users safe, but in reality they have their own interests... and perhaps their users are only a small part of that.

Apple has blocked Telegram from updating its iOS app, says founder
Some Telegram features, like stickers, don’t work after the iOS 11.4 update.
What we've got here is a failure to communicate
↧

How to make a dark theme for your blog that automatically adjusts for your visitors

$
0
0
How to make a dark theme for your blog that automatically adjusts for your visitors

Every time I learn some new piece of CSS I'm amazed at how flexible and powerful it is, and the prefers-color-scheme media element is no exception. The "dark mode" setting from a visitor's desktop or mobile device can be passed to the browser, which then applies it to your site according to your style sheets. So. Cool.

MDN has a good example and lots of notes, as usual (I love their docs!), and I created my own fun little example below. To try it out, toggle between light and dark mode on your device, and the sun should change to a moon. If it doesn't work for some reason, you can see what it should do in the screen capture at the bottom of this post. 🌞 🌜

How's it work?

Without diving too deep, here's a few pieces to the puzzle...

Responsive Design

It's possible (and has been for years) to design a website that responds to the device a visitor happens to be using, such as the wildly different screen sizes between a mobile device vs a desktop. This could involve writing JavaScript, but as CSS is given more power, it's able to handle most layouts all by itself.

Media Queries

One element of CSS that figures into responsive design is the media query, which can account for things like screen resolution, orientation, and whether the visitor prefers higher contrast colors.

For this to work, a device has to make this data available to the browser, which in turn has to use it to apply the correct CSS layout to the page. Different layouts result in different color schemes, collapsed menus, sidebars dropping below the post, etc.

Prefers-color-scheme Feature

One of the media features, called prefers-color-scheme, is used to determine whether the user prefers light mode or dark mode. It's based on their device settings, and most browsers support it.

You specify your "base" styles first - whatever you want applied no matter the device setting - and then you can override those based on whether the visitor prefers light or dark mode. Here's the code I used for the images above:

<style type="text/css">
    #pic {
        margin: auto;
        height: 400px;
        width: 400px;
        background-image: url("https://grantwinney.com/content/images/2020/02/sun.png");
        background-size: 360px 360px;
        background-repeat: no-repeat;
background-position: center;
    }

    @media (prefers-color-scheme: light), (prefers-color-scheme: no-preference) {
        #pic {
            background-color: skyblue;
            background-image: url("https://grantwinney.com/content/images/2020/02/sun.png"); }
    }

    @media (prefers-color-scheme: dark) {
        #pic {
            background-color: midnightblue;
            background-image: url("https://grantwinney.com/content/images/2020/02/moon.png"); }
    }
</style>

<div id="pic"></div>

Notice that there's a setting for when they haven't indicated a preference, and I used the , to indicate an "or" clause, meaning (in my example) that the sun image and skyblue background will show up if they've selected "light" mode or nothing at all.

And just in case it doesn't work for you, which probably means you're either using an unsupported browser or your device doesn't pass those settings to the browser, this is how it looks in Windows when I toggle between modes. In clockwise order is DuckDuckGo who has a whole separate theme, the MDN example I loved so much, the Windows settings for dark mode, aaaand.. some new-agey sun/moon example someone put together.

How to make a dark theme for your blog that automatically adjusts for your visitors
↧
↧

The command β€œeval git fetch origin +refs/pull/8/merge:” failed

$
0
0
The command β€œeval git fetch origin +refs/pull/8/merge:” failed

Like any decent dev shop, we employ continuous builds for our projects. I even use it for one-off projects like GhostSharp, to make sure any code I'm committing compiles correctly (when applicable) and that the tests all pass (always applicable).

Awhile back, I got an unexpected error from a Travis CI build that threw me a bit. It was trying to build my branch, for which there had been a PR, but it failed:

The command "eval git fetch origin +refs/pull/8/merge:" failed 3 times.

The full trace of the error was similar to this:

$ git fetch origin +refs/pull/8/merge:

fatal: Couldn't find remote ref refs/pull/8/merge
Unexpected end of command stream
The command "eval git fetch origin +refs/pull/8/merge:" failed. Retrying, 2 of 3.

fatal: Couldn't find remote ref refs/pull/8/merge
Unexpected end of command stream
The command "eval git fetch origin +refs/pull/8/merge:" failed. Retrying, 3 of 3.

fatal: Couldn't find remote ref refs/pull/8/merge
Unexpected end of command stream
The command "eval git fetch origin +refs/pull/8/merge:" failed 3 times.

The command "git fetch origin +refs/pull/8/merge:" failed and exited with 128.

Your build has been stopped.

What I eventually realized was that Travis CI (apparently) kicked off the job based on the PR that I had created against the branch (the pull/8 part in the above error), and I had just merged the PR into master. The branch wasn't deleted, but the PR was merged and closed.

Unfortunately, it was quite a while ago and I can't remember exactly what I did to fix it, but I either reopened the PR through GitHub and kicked off a build, or just kicked off a build against the (still-open) branch manually.

↧

How to find the iCal address for a public Google calendar

$
0
0
How to find the iCal address for a public Google calendar

If you already know why you're here, then just plug the public URL (or the calendar ID) from the calendar settings page, and click the appropriate button to get the iCal link. For everyone else, scroll past the text boxes for a brief explanation...

Public URL:
Calendar ID:
Converted:

I got my calendar, now learn me more!

Impressive, you didn't just run off. This won't take too long, I promise.

When someone creates a public Google calendar and shows it off to the world, you'll usually see a little "+ Google Calendar" button in the lower-right corner. Click on that, and you can import the calendar into your own Google account! How exciting!!

How to find the iCal address for a public Google calendar

Unless you've replaced it with another service, like I did. 😐

Shockingly, you might not want to import a Google calendar into a Google account. So you might double-check the calendar settings, and see a "Public URL". And you might think you can import that into another client. You might be very wrong.

How to find the iCal address for a public Google calendar
How to find the iCal address for a public Google calendar

The reason it can't be imported is that the Public URL from Google is really just a link to an HTML page, and other clients don't know what the heck to do with it. You need something that's standardized, that all clients can easily consume and do something with. You need an iCalendar file, which (if you open it up in a text editor) looks something like this:

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-TIMEZONE:UTC

BEGIN:VEVENT
DTSTART;VALUE=DATE:20210402
DTEND;VALUE=DATE:20210403
DTSTAMP:20200227T163343Z
UID:20210402_60o30dr5coo30c1g60o30dr56k@google.com
CLASS:PUBLIC
CREATED:20190517T221201Z
DESCRIPTION:Holiday or observance in: Connecticut\, Hawaii\, Delaware\, Ind
 iana\, Kentucky\, Louisiana\, New Jersey\, North Carolina\, North Dakota\, 
 Tennessee\, Texas
LAST-MODIFIED:20190517T221201Z
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Good Friday (regional holiday)
TRANSP:TRANSPARENT
END:VEVENT

BEGIN:VEVENT
DTSTART;VALUE=DATE:20200410
DTEND;VALUE=DATE:20200411
DTSTAMP:20200227T163343Z
UID:20200410_60o30dr5coo30c1g60o30dr56g@google.com
CLASS:PUBLIC
CREATED:20190517T221201Z
DESCRIPTION:Holiday or observance in: Connecticut\, Hawaii\, Delaware\, Ind
 iana\, Kentucky\, Louisiana\, New Jersey\, North Carolina\, North Dakota\, 
 Tennessee\, Texas
LAST-MODIFIED:20190517T221201Z
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Good Friday (regional holiday)
TRANSP:TRANSPARENT
END:VEVENT

...
...

BEGIN:VEVENT
DTSTART;VALUE=DATE:20191224
DTEND;VALUE=DATE:20191225
DTSTAMP:20200227T163343Z
UID:20191224_60o30dr56ko30c1g60o30dr56c@google.com
CLASS:PUBLIC
CREATED:20140108T163258Z
DESCRIPTION:
LAST-MODIFIED:20140108T163258Z
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Christmas Eve
TRANSP:TRANSPARENT
END:VEVENT

BEGIN:VEVENT
DTSTART;VALUE=DATE:20191102
DTEND;VALUE=DATE:20191103
DTSTAMP:20200227T163343Z
UID:20191102_60o30c9g6ko30c1g60o30dr56c@google.com
CLASS:PUBLIC
CREATED:20140108T163258Z
DESCRIPTION:
LAST-MODIFIED:20140108T163258Z
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:All Souls' Day
TRANSP:TRANSPARENT
END:VEVENT

END:VCALENDAR

Predictable, no?

As luck would have it, even though they don't make it obvious, you can easily extract the Calendar ID from any Google calendar URL (or just use the calendar ID directly if you know it), and replace {CALENDAR_ID} in the following URL. Or use the script I wrote, above.

https://calendar.google.com/calendar/ical/{CALENDAR_ID}/public/basic.ics

Violà! (that's Internet speak for "that's all folks!") 🐷

↧

Replacing Google Analytics, respecting user privacy, and owning your data

$
0
0
Replacing Google Analytics, respecting user privacy, and owning your data

Whenever you visit a web page, your browser includes some basic info about your environment in the request, like your IP address, screen resolution, the page you're requesting (duh), the page you came from, etc. You can see it in action here or here. Individual websites can log that information to see which pages (or posts in the case of bloggers) are the most popular over time, which sites are linking to their site, etc. It's pretty useful stuff.

Feeding the Machine

Well, the IP address part is a little creepy. It's an easy way to track someone doing something shady (if they're not using a VPN), but it's also an easy way for an advertiser to track you across the web. Wait, how's that possible? It's not like advertisers have access to all those servers. I can login to my server and view logs about visitors, but advertisers can't. Enter Google Analytics.

Just plug Google's code into the header of every page on your site, and you get access to things you already had access to, along with questionable stuff like demographics (gender, age, etc). But how?

That code sends all your private visitor data to Google, who slurps it up, feeds it to their big data monster, combines it with all the data coming from millions of other sites running the same code, along with cookies and identifiers, their DoubleClick tracker, etc, and presents a bunch of stats to you.

Like I said, advertisers couldn't have access to all that data unless you voluntarily sent it to them. In essence, you're trading your visitor's privacy for some stats that you might not even need or understand, but Google absolutely understands it all, and happily uses it to help advertisers serve up ads across the web. You can read more about where demographics and interests data comes from as well as a tracking code overview, straight from the horse's mouth.

Starving the Machine

There's another way. For the vast majority of bloggers and small website owners, the data you're interested in is already at your fingertips. Do you really need Google to tell you whether your content is being consumed by men, women, kids, or any other particular group? Do you even care, as long as someone finds it useful?

Personally, I'm only interested in which pages are most popular (so I can invest my time wisely in updating posts), who's referring visitors to my site (it might indicate a site where I should engage more), and how many total visitors I'm getting (if I choose to display an ad for some service I think is useful, it might be helpful to tell them I get xx number of visitors per month). Aaaaand.. that's about it. I don't care about your IP address or anything else.

There's a number of services that are more privacy-minded than Google Analytics, though admittedly that particular bar is pretty low. Just to name a few, Simple Analytics starts at $10/mo, Matomo is $20/mo or free to self-host, and GoAccess is a free self-hosted solution too. Another one is Fathom, which is $12/mo or free to self-host.

Replacing the Machine

I looked at a few, and ultimately settled on Fathom. The interface is clean, they don't use cookies to track visitors, and they even respect the "Do Not Track" setting in your browser. First though, I gotta say that one of my favorite features on DigitalOcean is the ability to take a snapshot of a server before making a change, and if installing some piece of software goes horribly wrong, a restore is only one click away. πŸ˜…

The instructions for installing Fathom are pretty straight-forward. If you want to deploy a brand new instance, try out DigitalOcean's Fathom Analytics droplet. Since I'm running this site on an pre-existing Ubuntu server that's already running Ghost, I had to take some extra steps. I'll toss them out here - whether or not they're helpful depends on your setup.

  • I selected the latest "fathom_1.2.1_linux_amd64.tar.gz" release.
  • I followed the "Configuring Fathom" section - even though it says it's optional, it seems to be required in the section for setting up an admin user.
  • I created the "my-fathom-site" file for SSL like they suggested, but ended up copying some of the details from my blog's SSL file, since I use Let's Encrypt and wanted to access the Fathom dashboard over SSL as well.
  • I configured UFW to block port 9000 (the default port), and to allow the secure port I configured for the dashboard.

And here's 10,000 words.. er, 10 images comparing the results of Google to Fathom. In general, the numbers are similar though definitely not identical. General trends and spikes throughout the day seem on par. Popular pages (and referrals, not shown in the Google captures) are similar enough too, as is bounce rate (probably easy to calculate if the "referrer" is the same site).

The page views and numbers are fairly close, although they diverge when comparing several days. The numbers look a bit higher with Fathom, but that could be caused by adblockers (and possibly even some browsers) that block Google Analytics.

The average time per page seems to consistently be about half (or less) on Fathom than what Google reports. That's a metric I can't really wrap my head around though. It seems at best a guess, since there's an HTTP request when coming in to the site, but there's no indication of when a visitor leaves... or just closes the tab. Maybe if someone hits "back" to try the next hit on Google's search page, they can make a reasonable guess as to when a visitor left your page.

Anyway, I'm happy enough with the results to disable Google Analytics. Thanks Fathom! (disclaimer: IMO, YMMV, BOGO, YOLO, and any other acronyms you like...)

↧

Yes, it's possible to test a WinForms app... using MVP

$
0
0
Yes, it's possible to test a WinForms app... using MVP

If you find yourself in a position where you're supporting a WinForms application, you're likely to notice the tests... or lack thereof. Just because we may not have been so focused on automated tests and continuous integration when WinForms was younger, that doesn't mean we can't introduce them now. Better late than never!

Let's say you had an absurdly simple Form, like this one. It has 3 fields boxes to enter values (why? I dunno, it's pre-beta!), and an ADD button to, well you know, add them in the bottom box. The "Running Total" field never resets, but just keeps adding each total as long as the app is running.

Yes, it's possible to test a WinForms app... using MVP

Assume the above is implemented like this.. a relatively short bit of code. Too bad none of these methods can take advantage of automated testing. You'd need an instance of the Form itself, and every method is accessing or otherwise updating UI components. That won't do!

public partial class CalcForm : Form
{
    public Form1()
    {
        InitializeComponent();
    }

    private void btnAdd_Click(object sender, EventArgs e)
    {
        decimal total = 0;
        
        total += SafeGetNumber(txtNumber1);
        total += SafeGetNumber(txtNumber2);
        total += SafeGetNumber(txtNumber3);
        
        txtTotal.Text = total.ToString();
        txtRunningTotal.Text = SafeGetNumber(txtTotal) + total;
    }

    private void btnReset_Click(object sender, EventArgs e)
    {
        txtNumber1.Text = txtNumber2.Text = txtNumber3.Text = txtTotal.Text = "";

        txtNumber1.Focus();
    }
    
    private decimal SafeGetNumber(TextBox tb)
    {
    return decimal.TryParse(tb.Text, out decimal res) ? res : 0;
    }
}

What is MVP?

In a nutshell, it's one of many frameworks (MVP, MVC, MVVM, etc) that all try to do the same thing - separate the UI from the business logic. There are multiple reasons for this, but right now I'm focusing on the fact it makes it easier to test the business logic.

MVP achieves this in 3 parts - a View, a Presenter, and a Model... and some interfaces thrown in for good measure. A quick disclaimer first - no doubt there are more ways to implement MVP than what I'm about to present, but keep in mind the end goal. We want to separate the UI from most of the rest of the code.

NOTE: If you want to try this out yourself, get the code from GitHub.

The View

The "View" represents the Form itself, and it includes an interface that represents everything you might need to get from (or set to) the Form, which is then used by the "Presenter" (more on that later) to tell it what to display next. Whereas before the View (your Form) had all the code neatly tucked away inside it, it's now very bare.

Here's how I converted the Form. Note that it's actually doing nothing intelligent now, other than wiring up all the UI components to various properties of the interface. Also note that I took the button click event handlers out of the designer file (where they automatically get created), and made those part of the interface as well. When a button's clicked, the Presenter will be know about it, and can act on it.

public interface ICalcView
{
    event EventHandler Add;
    event EventHandler Reset;
    string Value1 { get; set; }
    string Value2 { get; set; }
    string Value3 { get; set; }
    string Total { set; }
    string RunningTotal { set; }
    void Show();
}

public partial class CalcForm : Form, ICalcView
{
    public event EventHandler Add;
    public event EventHandler Reset;

    public CalcForm()
    {
        InitializeComponent();

        btnAdd.Click += delegate { Add?.Invoke(this, EventArgs.Empty); };
        btnReset.Click += delegate
        {
            Reset?.Invoke(this, EventArgs.Empty);
            txtNumber1.Focus();
        };
    }

    string ICalcView.Value1
    {
        get => txtNumber1.Text;
        set => txtNumber1.Text = value;
    }
    string ICalcView.Value2
    {
        get => txtNumber2.Text;
        set => txtNumber2.Text = value;
    }
    string ICalcView.Value3
    {
        get => txtNumber3.Text;
        set => txtNumber3.Text = value;
    }

    public string Total
    {
        set => txtTotal.Text = value;
    }
    public string RunningTotal
    {
        set => txtRunningTotal.Text = value;
    }
}

The Model

The "Model" represents some object that you're operating on. In my case, I made the Model a sort of calculator object that stores the totals and does the actual summing up. What you put in here is a bit subjective, but just keep the end goal in mind.

public interface ICalcModel
{
    decimal Total { get; }
    decimal RunningTotal { get; }
    void CalculateTotal(List<decimal> numbers);
}

public class CalcModel : ICalcModel
{
    public decimal Total { get; private set; }
    public decimal RunningTotal { get; private set; }

    public void CalculateTotal(List<decimal> numbers)
    {
        Total = numbers.Sum();
        RunningTotal += Total;
    }
}

The Presenter

So far, we've got a View that displays nothing, and a Model that stores numbers but can't do much else. What's the glue that ties them together? I present.. the Presenter!

The Presenter doesn't have an interface, at least not the way I designed it. But it does accept the interfaces that the View and Model implement, and it operates on those. It orchestrates everything, subscribing to events in the View, getting data from the View, passing that data to the Model, and moving things back and forth as needed.

Note that it doesn't actually touch the UI though.. just passes things back to the View, which pops it into the UI where the user can see it.

public class CalcPresenter
{
    readonly ICalcView view;
    readonly ICalcModel model;

    public CalcPresenter(ICalcView view = null, ICalcModel model = null)
    {
        this.view = view;
        this.model = model;
        this.view.Add += Add;
        this.view.Reset += Reset;
        this.view.Show();
    }

    public void Add(object sender, EventArgs e)
    {
        model.CalculateTotal(new List<string> { view.Value1, view.Value2, view.Value3 }.ConvertAll(TryGetNumber));

        view.Total = Convert.ToString(model.Total);
        view.RunningTotal = Convert.ToString(model.RunningTotal);
    }

    public void Reset(object sender, EventArgs e)
    {
        view.Value1 = view.Value2 = view.Value3 = view.Total = "";
    }

    public decimal TryGetNumber(string input)
    {
        return decimal.TryParse(input, out decimal res) ? res : 0;
    }
}

Why should I care?

"Ugh", you might be thinking. "This is sooOOooOOooOo much longer than before", you might be thinking. You're right, it is. But it's also a lot more intentional, and a lot more separated out. And it allows us to mock the interfaces and thoroughly test the logic in the Presenter and Model, like this.

[TestFixture]
public class CalcPresenterTests
{
    Mock<ICalcView> mockView;
    Mock<ICalcModel> mockModel;
    CalcPresenter presenter;

    [SetUp]
    public void Setup()
    {
        mockModel = new Mock<ICalcModel>();
        mockView = new Mock<ICalcView>();
        presenter = new CalcPresenter(mockView.Object, mockModel.Object);
    }

    [Test]
    public void AddTest()
    {
        mockView.SetupGet(x => x.Value1).Returns("10");
        mockView.SetupGet(x => x.Value2).Returns("20");
        mockView.SetupGet(x => x.Value3).Returns("30");
        mockModel.SetupGet(x => x.Total).Returns(60m);
        mockModel.SetupGet(x => x.RunningTotal).Returns(100m);

        presenter.Add(null, null);

        mockModel.Verify(x => x.CalculateTotal(It.IsAny<List<decimal>>()), Times.Once);
        mockView.VerifySet(x => x.Total = "60", Times.Once);
        mockView.VerifySet(x => x.RunningTotal = "100", Times.Once);
    }

    [Test]
    public void ResetTest()
    {
        presenter.Reset(null, null);

        mockView.VerifySet(x => x.Value1 = "", Times.Once);
        mockView.VerifySet(x => x.Value2 = "", Times.Once);
        mockView.VerifySet(x => x.Value3 = "", Times.Once);
        mockView.VerifySet(x => x.Total = "", Times.Once);
        mockView.VerifySet(x => x.RunningTotal = It.IsAny<string>(), Times.Never);
    }

    [Test]
    [TestCase("3", 3)]
    [TestCase("-3.22", -3.22)]
    [TestCase("0", 0)]
    [TestCase("", 0)]
    [TestCase("bad input!!", 0)]
    public void TryGetNumberReturnsExpectedValue(string input, decimal output)
    {
        Assert.AreEqual(output, presenter.TryGetNumber(input));
    }
}

[TestFixture]
public class CalcModelTests
{
    CalcModel model;

    [SetUp]
    public void Setup()
    {
        model = new CalcModel();
    }

    [Test]
    [TestCase(-13, -1, -1, -1, -10)]
    [TestCase(0, 0, 0, 0)]
    [TestCase(15, 1, 2, 3, 4, 5)]
    public void AddingNumbersGeneratesExpectedTotal(decimal expectedTotal, params int[] inputs)
    {
        model.CalculateTotal(inputs.Select(Convert.ToDecimal).ToList());

        Assert.AreEqual(expectedTotal, model.Total);
    }

    [Test]
    public void AddingNumbersTwiceRetainsLastOnly()
    {
        model.CalculateTotal(new List<decimal> { 1, 2, 3 });
        model.CalculateTotal(new List<decimal> { 10, 20, 30 });

        Assert.AreEqual(60, model.Total);
    }

    [Test]
    public void AddingNumbersTwiceIncreasesRunningTotal()
    {
        model.CalculateTotal(new List<decimal> { 1, 2, 3 });
        Assert.AreEqual(6, model.RunningTotal);

        model.CalculateTotal(new List<decimal> { 10, 20, 30 });
        Assert.AreEqual(66, model.RunningTotal);
    }
}

The end result? The beginnings of an automated test suite! You can plug this into TeamCity, Jenkins, or another CI tool and begin to get automated test runs. Yes, this is a lot more difficult in a large app that's been around for years, but with effort it's absolutely doable, one step at a time.

Yes, it's possible to test a WinForms app... using MVP
↧
↧

SO Vault: Break statements in the real world

$
0
0
SO Vault: Break statements in the real world

StackOverflow sees quite a few threads deleted, usually for good reasons. Among the stinkers, though, lies the occasionally useful or otherwise interesting one, deleted by some pedantic nitpicker - so I resurrect them. πŸ‘»

Note: Because these threads are older, info may be outdated and links may be dead. Feel free to contact me, but I may not update them... this is an archive after all.


Break statements In the real world

Question asked by Lodle

Been having a discussion on whirlpool about using break statements in for loops. I have been taught and also read elsewhere that break statements should only be used with switch statements and with while loops on rare occasions.

My understanding is that you should only use for loops when you know the number of times that you want to loop, for example do work on x elements in an array, and while loops should be used every other time. Thus a for loop with a break can be easily refactored into a while loop with a condition.

At my university, you will instantly fail an assignment if you use break anywhere but in a switch statement as it breaks the coding guideline of the university. As I'm still completing my software engineering degree I would like to know from people in the real world.

Comments

it would depend how you use it. In my opinion those who say " never use" are wrong. Even goto statement has its uses. – Anycorn Apr 17 '10 at 3:31

@DR Well not goto. You're kind of crossing the line there. Goto is more like horseradish sauce - hardly at all if any. – bobobobo Jun 19 '10 at 17:24

I find it hard to believe there's a real university which enforces uniform coding standards on all courses. – shoosh Jun 28 '10 at 15:37

Unfortunately, all too often the university isn't anything like the real world. – Loren Pechtel Jul 28 '10 at 4:47

your uni lecturers sound like academic muppets. I recall my uni lecturers. None of them could code for shit. Those who can, do, those who can't - teach! – user206705 Nov 17 '10 at 17:40


Answer by paxdiablo

These generalized rules are rubbish as far as I'm concerned. Use what the language allows in the real world as long as it aids (or doesn't degrade) readability. The guideline against using break is no different to that against using goto. The reason behind people not liking them is because it may lead to spaghetti code, hard to follow.

Note the use of two phrases in that sentence above: The first was "guideline" instead of rule - the only rules are those imposed by the standards. Guidelines are for best practices but you have to understand the reasons behind them, not just follow them blindly.

The second was "may lead to" rather than "does lead to". There are situations where break and its brethren actually lead to more readable code than the alternative (which is often a hugely ugly condition in the looping statement).

For example, they make a lot of sense in finite state machines.

As some have pointed out, break can lead to post-conditions of a loop being variable. By that, I mean that:

for (i = 0; i < 50; i++) {
    if (someCondition) {
        break;
    }
}

can lead to i holding an indeterminate value after the loop.

But you should keep in mind that only matter if you actually care what i is set to after the loop. If the next statement is:

for (i = 0; i < 50; i++) { ... }

then it doesn't matter at all.

A piece of code like:

while (x != 0) {
    y = doSomethingWith (x);
    if (y == 0) break;
    process (y);

    z = doSomethingElseWith (x);
    if (z == 0) break;
    process (z);

    x--;
}

violates this guideline and can be refactored into something that doesn't but there is nothing unreadable about this piece of code. You can clearly see all flows of control at a single glance.

You should use the language features that make sense to your situation. Where guidelines should be enforced and where they should be ignored comes with experience.


Answer by Norman Ramsey (Apr 17, 2010)

I've been told by professors and peers at my university that using the break statement is bad practice

Come visit Tufts and our professors will tell you otherwise.

The arguments against break boil down to one principle: break requires non-local reasoning, and a language with break requires a much more complicated semantic framework than a language without break. (For the experts in the room, instead of using simple tools like predicate transformers or Hoare logic, you have to reach for something like continuations, or at the very least, a context semantics.)

The problem with this argument is that it puts simplicity of semantics ahead of programmers' real needs. There are lots of programs with natural loops that have more than one exit. Programming languages need to support these loops in a way that is more effective than introducing extra Boolean variables to govern the control flow.

For some expert testimony on the value of multiple exits from control-flow constructs, I recommend two papers:

  • Structured Programming With goto Statements by Donald E. Knuth. Don goes to great length to explain why certain kinds of gotos should be allowed in Pascal. Most of these gotos are equivalent to some form of break, which hadn't quite been invented yet when Don wrote the paper.

  • Exceptional Syntax by Nick Benton and Andrew Kennedy. The topic may seem unrelated, but throwing an exception is a nonlocal exit, just like break. (In Modula-3, break was defined to be an exception.) It's a great paper showing how language designers need to be more imaginative in designing syntax to support multiple exits.

If you really want to annoy your professors, ask them if the return statement is bad practice. If they say "no", you've got them: "But isn't return a control operator, just like break? And isn't it the case that introducing return intro a structured program creates all the same semantic difficulties that introducing break does?" Watch them squirm.

Is using the break statement bad practice?

No. The break statement is a valuable tool in your toolbox, just like return or exceptions. Like other tools, it can be misused, but there is nothing inherently bad about it, and in fact the break statement is pretty easy to use in sane and sensible ways.

Your professors should learn some more powerful semantic methods that can tame the break statement.


Answer by Joel (Apr 17, 2010)

This comes from the idea that there should be one way IN a method and one way OUT. Same with loops. I've had some instructors tell me that I shouldn't use more than one return or any break/continue because it creates "spaggetti code" and it's hard to follow the path. Instead, they say to set a flag and use an if statement rather than just break out. I completely disagree with this idea. I think in a lot of cases having more than one return or a break/continue statement is much more readable and easier to follow.


Answer by Stephen C

I've been told by professors and peers at my university that using the break statement is bad practice

The first thing to realize is that many of those people have never actually been professional software engineers, and never had to work on a large code base written by many developers over many years. If you do this, you learn that simplicity, clarity, consistency and use of accepted idioms are more important in making code maintainable than dogma like avoiding break/continue/multiple return.

I personally have no problems reading and understanding code that uses break to get out of loops. The cases where I find a break unclear tend to be cases where the code needs to be refactored; e.g. methods with high cyclomatic complexity scores.

Having said that, your professors have the right motivation. That is, they are trying to instill in you the importance of writing clear code. I hope they are also teaching you about the importance of consistent indentation, consistent line breaking, consistent white space around operators, identifier case rules, meaningful identifiers, comments and so on ... all of which are important to making your code maintainable.


Answer by Greg (Oct 19, 2008)

I don't see any harm in using break - it's useful and simple. The exception is when you have a lot of messy code inside your loop, it can be easy to miss a break tucked away in 4 levels of ifs, but in this case you should probably be thinking about refactoring anyway.

Edit: IMHO it's much more common to see break in a while than a for (although seeing continue in a for is pretty common) but that doesn't mean it's bad to have one in a for.


Answer by Greg B

I think it's a completely pompous and ridiculous rule to enforce.

I often use break within a for loop. If i'm searching for something in an array and don't need to keep searching once I find it, I will break out of that loop.

I agree with @Konrad Rudolph above, that any and all features should be used as and when the developer sees fit.

In my eye, a for loop is more obvious at a glance than a while. I will use a for over a while any day unless a while is specifically needed. And I will break from that for if logic requires it.


Answer by Richard Harrison (Oct 19, 2008)

My rule is to use any and all features of the language where it doesn't produce obscure or unreadable code.

So yes, I do on occasion use break, goto, continue


Answer by Rob Walker (Oct 19, 2008)

I often use break inside a for loop.

The advantage of a for loop is that the iterator variable is scoped within the expression. If a language feature results in less lines of code, or even less indented code then IMHO it is generally a good thing and should be used to improve readability.

e.g.

for (ListIt it = ...; it.Valid(); it++)
{
  if (it.Curr() == ...)
  {
     .. process ...
     break;
   }
}

Rewriting this using a for loop would require several more lines, and leak the iterator out of the scope of the loop.

(Pedantic points: I only want to act on the first match, and the condition being evaluated isn't suitable for any Find(...) method the list has).


Answer by Cervo (Oct 19, 2008)

Break is useful for avoiding nesting. Also there are many times that it is useful to prematurely exit a loop. It also depends on the languages. In languages like C and Java a for loop basically is a while loop with an initialization and increment expression.

is it better to do the following (assume no short circuit evaluation)

list = iterator on something
while list.hasItem()
  item = list.next()
  if item passes check
      if item passes other check
            do some stuff
            if item passes other check
                  do some more stuff
                  if item is not item indicating end of list
                        do some more stuff
                  end if
            end if
       end if
   end if
end while

or is it better just to say

while list.hasItem()
     item = list.next()
     if check fails continue
       .....
     if checkn fails continue
     do some stuff
     if end of list item checks break
end while

For me it is better to keep the nesting down and break/continue offer good ways to do that. This is just like a function that returns multiple times. You didn't mention anything about continue, but in my opinion break and continue are of the same family. They help you to manually change loop control and are great at helping to save nesting.

Another common pattern (I actually see this in university classes all the time for reading files and breaking apart strings) is

currentValue = some function with arguments to get value
while (currentValue != badValue) {
    do something with currentValue
    currentValue = some function with arguments to get value
}

is not as good as
while (1) {
    currentValue = some function with arguments to get value
    if (currentValue == badValue)
       break
    do something with currentValue
}

The problem is that you are calling the function with arguments to create currentValue twice. You have to remember to keep both calls in sync. If you change the arguments for one but not the other you introduce a bug. You mention you are getting a degree in software engineering, so I would think there would be emphasis on not repeating yourself and creating easier to maintain code.

Basically anyone who says any control structure is bad and completely bans it is being closed minded. Most structures have a use. The biggest example is GOTO. A lot of people abused it and jumped in the middle of other sub procedures, and basically jumped forwards/backwards all over the code and gave it a bad name. But GOTO has its uses. Using GOTO to exit a loop early was a good use, now you have break. Using GOTO to centralize exception handling was another good use. Now you have try/catch exception handling in many languages. In assembly there is only GOTO for the most part. And using that you can create a disaster. Or you can create our "structured" programming structures. In truth I generally don't use GOTO except in excel VBA because there is no equivalent to continue (that I know of) and error handling code in VB 6 utilizes goto. But I still would not absolutely dismiss the control structure and say never...

Unfortunately the reality is that if you don't want to fail, you will have to avoid using break. It is unfortunate that university doesn't have more open minded people in it. To keep the level of nesting down you can use a status variable.

status variable = true
while condition and status variable = true
  do stuff
  if some test fails
    status variable = false
  if status variable = true
     do stuff
  if some test fails
     status variable = false
  ....
end while

That way you don't end up with huge nesting.


Answer by Duck (Apr 17, 2010)

Makes switch statements a whole lot easier.


Answer by Sol (Oct 19, 2008)

I would argue your teachers' prohibition is just plain poor style. They are arguing that iterating through a structure is a fundamentally different operation than iterating through the same structure but maybe stopping early, and thus should be coded in a completely different way. That's nuts; all it's going to do is make your program harder to understand by using two different control structures to do essentially the same thing.

Furthermore, in general avoiding breaks will make your program more complicated and/or redundant. Consider code like this:

for (int i = 0; i < 10; i++)
{
   // do something
   if (check on i) 
        break;
   // maybe do something else
}

To eliminate the break, you either need to add an additional control boolean to signal it is time to finish the loop, or redundantly check the break condition twice, once in the body of the loop and once in the loop's control statement. Both make the loop harder to understand and introduce more opportunities for bugs without buying you any additional functionality or expressiveness. (You also need to hoist the declaration of i out of the loop's control structure, adding another scope around the entire mess.)

If the loop is so big you cannot easily follow the action of the break statement, then you'd be better off refactoring the loop than adding to its complexity by removing the break statement.


Answer by stakx

Edit: First of all, sorry if this answer seems somewhat long-winded. However, I'd like to demonstrate where my expressed opinion about the break statement (bold text at the end of my answer) comes from.


One very good use case for the break statement is the following. In loops, you can usually check a break condition in either of two places: Either at the loop's beginning, or at the loop's end:

while (!someBreakCondition)
{
    ...
}

do
{
    ...
} while (!someBreakCondition)

Now, what do you do when, for some reason, you cannot put your break condition in either of these places, because e.g. you first need to retrieve a value from somewhere before you can compare it to some criterion?

// assume the following to be e.g. some API function that you cannot change:
void getSomeValues(int& valueA, int& valueB, int& valueC)
//                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//                           out parameters
{
    ...
}

while (true)
{
    int a, b, c;
    getSomeValues(a, b, c);
    bool someBreakCondition = (b = ...);

    ...   // <-- we do some stuff here
}

In the above example, you could not check for b directly inside the while header, because it must first be retrieved from somewhere, and the way to do that is via a void function. No way to put that into the while break condition.

But it may also be too late to wait until the end of the loop for checking the condition, because you might not want to execute all the code in the loop body:

do
{
    int a, b, c;
    getSomeValues(a, b, c);
    bool someBreakCondition = (b = ...);

    ...   // <-- we want to do some stuff here, but only if !breakCondition
} while (!someBreakCondition);

In this situation, there's two approaches how to avoid executing the ... when someBreakCondition is true:

1) without using break

    ...  // as above

    if (!someBreakCondition)
    {
        ...  // <-- do stuff here
    }
} while (!someBreakCondition);

2) with break

    ...  // as above
    if (someBreakCondition) break;

    ...  // <-- do stuff here
} while (true);

Your professor would obviously favour option 1) (no break). I personally think that's not a nice option, because you have to check the break condition in two places. Option 2) (use break) resolves that problem. Admittedly though, option 1) has the advantage that it becomes more easily recognisable through code indentation under what condition the ... code will execute; with option 2), you have to go up in your source code to find out that program execution might never actually get to some place further down.

In conclusion, I think using or not using break in this (quite frequent) scenario it really comes down to personal preference about code legibility.


Answer by Jon Ericson (Oct 21, 2008)

break is an extremely valuable optimization tool and is especially useful in a for loop. For instance:

-- Lua code for finding prime numbers
function check_prime (x)
   local max = x^0.5;

   for v in pairs(p) do
      if v > max then break end;

      if x%v==0 then 
         return false
      end 
   end
   p[x] = true;
   return x 
end

In this case, it isn't practical to set up the for loop to terminate at the right moment. It is possible to re-write it as a while loop, but that would be awkward and doesn't really buy us anything in terms of speed or clarity. Note that the function would work perfectly well with out the break, but it would also be much less efficient.

The huge advantage of using break rather than refactoring into a while loop is that the edge cases are moved to a less important location in the code. In other words, the main condition for breaking out of a loop should be the only condition to avoid confusion. Multiple conditions are hard for a human to parse. Even in a while loop, I'd consider using break in order to reduce the number of break conditions to just one.


I'm aware that this is not the most efficient prime checker, but it illustrates a case where break really helps both performance and readability. I have non-toy code that would illustrate it, but require more background information to set it up.


Answer by Konrad Rudolph (Oct 19, 2008)

If it's a guideline that's enforced by the corrector, you don't really have the choice.

But as guidelines go, this one seems to be excessive. I understand the rationale behind it. Other people argue (in the same vein) that functions must only have one single exit point. This may be helpful because it can reduce control flow complexity. However, it can also greatly increase it.

The same is true for break in for loops. Since all loop statements are basically idempotent, one kind can always be substituted for any other. But just because you can doesn't mean that this is good. The most important coding guideline should always be β€œuse your brain!”


Answer by Stefan RΓ₯dstrΓΆm (Oct 19, 2008)

In a way I can understand the professor's point of view in this matter, but only as a way to teach the students how to solve problems in a (some kind of) standard fashion. Learn to master these rules, then you are free to break against them as you wish, if that will make the code easier to understand, more effective, or whatever.


Answer by Ken Paul (Oct 21, 2008)

  1. While in school, follow the defined guidelines. Some guidelines are arbitrary, and exist primarily for consistency, ease of grading, or to keep within the teacher's limited understanding. The best balance between maximizing learning and maximizing grades is to follow the guidelines.
  2. In the real world, the balance shifts to maximizing benefit for your employer. This usually requires a focus on readability, maintainability and performance. Since programmers rarely agree on what maximizes these qualities, employers typically attempt to enforce even more arbitrary guidelines. Here the stakes are keeping your job and possibly climbing to a leadership position where you can actually influence the standards.

Answer by Vivin Paliath (Apr 17, 2010)

I tend to shy away from breaks. Most of the time, I've found that a flag is sufficient. Of course, this doesn't mean that breaks are always bad. They do have their uses. Your teachers and peers are right about it being a "get out of jail free" card. This is because breaks, like gotos, have a very high potential for abuse. It's often easier to break out of a loop than to figure out how to structure the loop correctly. In most cases, if you think about your algorithm and logic, you will find that you do not need a break.

Of course, if your loop ends up having a whole bunch of exit conditions, then you need to either rethink your algorithm or use breaks. I tend to really sit and think about my algorithms before deciding to use a break. When you write code and you get that voice in your head that says There has to be a better way than this!, you know it's time to rewrite your algorithm.

As far as loops are concerned, I use for loops when want to iterate over a collection of items with a known bound and I know that there are no other exit conditions. I use a while loop when I want to run a series of statements over and over again until something happens. I usually use while loops if I am searching for something. In this case, I don't break out of the while. Instead, I use a flag because I think the while loop in its entirety reads more like English and I can tell what it's doing. For example:

int i = 0;
while(i < size && !found) {
   found = (value == items[i]);
   i++;
}

The way I read that in English in my head is While i is lesser than the total count of items and nothing is found. That, versus:

int i = 0;
for(int i = 0; i < count; i++) {
   if(value == items[i]) {
      break;
   }
}

I actually find that a bit harder to read because I can't tell immediately from the first line what the loop is actually doing. When I see a for loop, I think Ok, I'm running over a known list of items no matter what. But then I see an if in that loop and then a break inside the if block. What this means is that your loop has two exit conditions, and a while would be a better choice. That being said, don't do this either:

int i = 0;
while(i < count) {
   if(value == items[i]) {
      break;
   }
   i++;
}

That's not much better than the for with the break. To sum it all up, I'd say use break as a last resort, and only if you are sure that it actually will make your code easier to read.


Answer by Steve Jessop (Apr 17, 2010)

Those who claim that it is bad say that it's a "get out of jail free card" and that you can always avoid using it.

They may claim that, but if that's their actual argument then it's nonsense. I'm not in jail, and as a programmer I'm not in the business of avoiding things just because they can be avoided. I avoid things if they're harmful to some desired property of my program.

There's a lot of good discussion here, but I suggest that your professors are not idiots (even if they're wrong), and they probably have some reason in mind when they say not to use break. It's probably better to ask them what this is, and pay attention to the answer. You've given your "guess", but I propose to you that they would state their case better than you state it. If you want to learn from them, ask them to explain the exact details, purpose, and benefits of their coding guideline. If you don't want to learn from them, quit school and get a job ;-)

Admittedly, I don't agree with their case as you've stated it, and neither do many others here. But there's no great challenge in knocking down your straw-man version of their argument.


Answer by Jonah (Jun 26, 2010)

Here is a good use-case in PHP:

foreach($beers as $beer) {
    $me->drink($beer);

    if ($me->passedOut) {
        break; //stop drinking beers
    }
}

Answer by OregonGhost (Oct 19, 2008)

I also learned at uni that any functions should have only a single point of exit, and the same of course for any loops. This is called structured programming, and I was taught that a program must be writable as a structogram because then it's a good design.

But every single program (and every single structogram) I saw in that time during lectures was ugly, hardly readable, complex and error-prone. The same applies to most loops I saw in those programs. Use it if your coding guidelines require it, but in the real world, it's not really bad style to use a break, multiple returns or even continue. Goto has seen much more religious wars than break.


Answer by Jay Bazuzi (Oct 19, 2008)

Most of the uses of break are about stopping when you've find an item that matches a criteria. If you're using C#, you can step back and write your code with a little more intent and a little less mechanism.

When loops like this:

foreach (var x in MySequence)
{
    if (SomeCritera(x))
    {
        break;
    }
}

start to look like:

from x in mySequence
where x => SomeCriteria(x)
select x

If you are iterating with while because the thing you're working on isn't an IEnumerable<T>, you can always make it one:

    public static IEnumerable<T> EnumerateList<T>(this T t, Func<T, T> next)
    {
        while (t != null)
        {
            yield return t;
            t = next(t);
        }
    }

Answer by DJClayworth (Oct 21, 2008)

The rule makes sense only in theory. In theory for loops are for when you know how many iterations there are, and while loops are for everything else. But in practice when you are accessing something for which sequentil integers are the natural key, a for loop is more useful. Then if you want to terminate the loop before the final iteration (because you've found what you are looking for) then a break is needed.

Obey your teacher's restriction while you are writing assignments for him. Then don't worry about it.


Answer by tvanfosson (Oct 19, 2008)

I understand the issue. In general you want to have the loop condition define the exit conditions and have loops only have a single exit point. If you need proof of correctness for your code these are invaluable. In general, you really should try to find a way to keep to these rules. If you can do it in an elegant way, then your code is probably better off. However, when your code starts to look like spaghetti and all the gymnastics of trying to maintain a single exit point get in the way of readability, then opt for the "wrong" way of doing it.

I have some sympathy for your instructor. Most likely he just wants to teach you good practices without confusing the issue with the conditions under which those practices can be safely ignored. I hope that the sorts of problems he's giving you easily fit into the paradigm he wants you to use and thus failing you for not using them makes sense. If not, then you get some experience dealing with jerks and that, too, is a valuable thing to learn.


Answer by Thomas Padron-McCarthy (Mar 17, 2009)

Another view: I've been teaching programming since 1986, when I was teaching assistant for the first time in a Pascal course, and I've taught C and C-like languages since, I think, 1991. And you would probably not believe some of the abuses of break that I have seen. So I perfectly understand why the original poster's university outlaws it. It is also a good thing to teach students that just because you can do something in a language, that doesn't mean that you should. This comes as a surprise to many students. Also, that there is such a thing as coding standards, and that they may be helpful -- or not.

That aside, I agree with many other posters that even if break can make code worse, it can also make it better, and, like any other rule, the no-breaks rule can and (sometimes) should be broken, but only if you know what you're doing.


Answer by Pulsehead

You are still in school. Time to learn the most important mantra that colleges require of you:
Cooperate and Graduate.

It's good that your school has a guideline, as any company you work for (worth a plugged nickel) will also have a coding guideline for whatever language you will be coding in. Follow your guideline.


Answer by Samuel (Apr 17, 2010)

I don't think there is anything wrong with using breaks. I could see how using a break can be seen as skipping over code but it's not like a goto statement where you could end up anywhere. break has better logic, "just skip to the end of the current block".

slightly off topic...(couldn't resist)
http://xkcd.com/292/


Answer by Cervo

@lodle.myopenid.com

In your answer the examples do not match. Your logic is as follows in the example in the equation and example A in your answer:

while X != 0 loop
   set y
   if y == 0 exit loop
   set z
   if z == 0 exit loop
   do a large amount of work
   if some_other_condition exit loop
   do even more work
   x = x -1

example b:
while X != 0 loop
  set y
  if y == 0
    set z
  elseif z == 0
    do a large amount of work
  elseif (some_other_condition)
    do even more work
  x--

This is absolutely not the same. And this is exactly why you need to think about using break.

First of all in your second example you probably meant for the if var == 0 to be if var != 0, that is probably a typo.

  1. In the first example if y or z is 0 or the other condition is met you will exit the loop. In the second example you will continue the loop and decrement x = x - 1. This is different.
  2. You used if and else if. In the first example you set y, then check y, then set z then check z, then you check the other condition. In the second example you set y and then check y. Assuming you changed the check to y != 0 then if y is not 0 you will set z. However you use else if. You will only check Z != 0 (assuming you changed it) if y == 0. This is not the same. The same argument holds to other stuff.

So basically given your two examples the important thing to realize is that Example A is completely different from Example B. In trying to eliminate the break you completely botched up the code. I'm not trying to insult you or say you are stupid. I'm trying to overemphasize that the two examples don't match and the code is wrong. And below I give you the example of the equivalent code. To me the breaks are much easier to understand.

The equivalent of example A is the following

  done = 0;
  while X != 0 && !done {
    set y
    if y != 0 {
      set z
      if z != 0 {
        do large amount of work
        if NOT (some_other_condition {
          do even more work
          x = x - 1
        } else
          done = 1;
      } else
        done = 1;
    } else
      done = 1;
  }

As you can see what I wrote is completely different from what you wrote. I'm pretty sure mine is right but there may be a typo. This is the problem with eliminating breaks. A lot of people will do it quickly like you did and generate your "equivalent code" which is completely different. That's why frankly I'm surprised a software engineering class taught that. I would recommend that both you and your professor read "Code Complete" by Steve McConnell. See http://cc2e.com/ for various links. It's a tough read because it is so long. And even after reading it twice I still don't know everything in it. But it helps you to appreciate many software implementation issues.


Answer by Dustin Getz (Oct 21, 2008)

break typically does make loops less readable. once you introduce breaks, you can no longer treat the loop as a black box.

while (condition)
{
   asdf
   if (something) break;
   adsf
}

cannot be factored to:

while (condition) DoSomething();

From Code Complete:

A loop with many breaks may indicate unclear thinking about the structure of the loop or its role in the surrounding code. Excessive breaks raises often indicates that the loop could be more clearly expressed as a series of loops. [1]

Use of break eliminates the possibility of treating a loop as a black box1. Control a loop's exit condition with one statement to simplify your loops. 'break' forces the person reading your code to look inside to understand the loop's control, making the loop more difficult to understand. [1]

  1. McConnell, Steve. Code Complete, Second Edition. Microsoft Press Β© 2004. Chapter 16.2: Controlling the Loop.

Answer by Robert Rossney (Oct 21, 2008)

Out of curiosity, I took a little tour of the codebase I'm working on - about 100,000 lines of code - to see how I'm actually using this idiom.

To my surprise, every single usage was some version of this:

foreach (SomeClass x in someList)
{
   if (SomeTest(x))
   {
      found = x;
      break;
   }
}

Today, I'd write that:

SomeClass found = someList.Where(x => SomeText(x)).FirstOrDefault();

which, through the miracle of LINQ deferred execution, is the same thing.

In Python, it would be:

try:
   found = (x for x in someList if SomeTest(x)).next()
except StopIteration:
   found = None

(It seems like there should be a way to do that without catching an exception, but I can't find a Python equivalent of FirstOrDefault.)

But if you're not using a language that supports this kind of mechanism, then of course it's OK to use the break statement. How else are you going to find the first item in a collection that passes a test? Like this?

SomeClass x = null;
for (i = 0; i < SomeList.Length && x == null; i++)
{
   if (SomeTest(SomeList[i]))
   {
      x = SomeList[i];
   }
}

I think break is just a wee bit less crazy.


Answer by Personman

In general, anything that makes execution jump around and isn't a function call has the potential to make your code more confusing and harder to maintain. This principle first gained widespread acceptance with the publication of Dijkstra's Go To Statement Considered Harmful article in 1968.

break is a more controversial case, since there are many common use cases and is often pretty clear what it does. However, if you're reading through a three- or four-deep nested loop and you stumble upon a break (or a continue), it can be almost as bad. Still, I use it sometimes, as do many others, and it's a bit of a personal issue. See also this previous StackOverflow question: Continue Considered Harmful?


Answer by Brian Gianforcaro (Apr 17, 2010)

I believe your professor's are just trying (wisely) to instil go coding practices in you. Break's, goto's, exit(), etc can often be the cause behind extraneous bugs throughout code from people new to programming who don't really have a true understanding of what's going on.

It's good practice just for readability to avoid intruding possible extra entrances and exit's in a loop/code path. So the person who reads your code won't be surprised when they didn't see the break statement and the code doesn't take the path they thought it would.


Answer by Pavel Radzivilovsky

In the real world, few people care about style. However, break from loop is an okay thing by strictest coding guidelines, such as that of Google, Linux kernel and CppCMS.

The idea of discouraging break comes from a famous book, Structured Programming by Dijkstra http://en.wikipedia.org/wiki/Structured_programming that was the first one to discourage goto. It suggested an alternative to goto, and suggested principles which might have misled your professors.

Since then, a lot changed. Nobody seriously believes in one point of return, but, the goto - a popular tool at the time of the book - was defeated.


Answer by S.Lott

In the real-world, I look at every break statement critically as a potential bug. Not an actual bug, but a potential bug. I challenge the programmers I work with on every break statement to justify its use. Is it more clear? Does it have the expected results?

Every statement (especially every composite statement) has a post-condition. If you can't articulate this post-condition, you can't really say much about the program.

Example 1 -- easy to articulate.

while not X:
   blah blah blah
assert X

Pretty easy to check that this loop does that you expected.

Example 2 -- harder to articulate.

while not X:
   blah
   if something I forgot: 
      break
   blah blah
   if something else that depends on the previous things:
      break
   blah
assert -- what --?
# What's true at this point?  X?  Something?  Something else?
# What was done?  blah?  blahblah?

Not so easy to say what the post-condition is at the end of that loop. Hard to know if the next statements will do anything useful.

Sometimes (not always, just sometimes) break can be bad. Other times, you can make the case that you have loop which is simpler with a break. If so, I challenge programmers to provide a simple, two-part proof: (1) show the alternative and (2) provide some bit of reasoning that shows the post-conditions are precisely the same under all circumstances.

Some languages have features that are ill-advised. It's a long-standing issue with language design. C, for example, has a bunch of constructs that are syntactically correct, but meaningless. These are things that basically can't be used, even though they're legal.

break is on the hairy edge. Maybe good sometimes. Maybe a mistake other times. For educational purposesβ€”it makes sense to forbid it. In the real world, I challenge it as a potential quality issue.


Shared with attribution, where reasonably possible, per the SO attribution policy and cc-by-something. If you were the author of something I posted here, and want that portion removed, just let me know.

↧

Being aware of how sites reel you in... and hook you 🎣

$
0
0
Being aware of how sites reel you in... and hook you 🎣

I think most of us have a general sense of uneasiness with the firm grasp the most popular sites on the Internet have on us, but it's no mistake in design that they're popular... or that the cause of uneasiness is also the salve.

Post a few thoughts and feelings to Twitter, like a few tweets and follow someone in the hope they'll reciprocate, wonder why they didn't. Check Facebook to see what old friends are up to, marvel that you're nothing like them anymore, silently judge them while hoping not to be silently judged. Check Instagram, feel jealous of someone's carefully crafted photo, unaware that someone else is jealous of yours.

Scroll the feed/timeline/whatever to see if you missed anything, just once more, okay twice. Check the news for stories that confirm your world view, make you feel more "normal" in their outlandishness, or just set you on edge. Flip back to twitter for funny cat videos, see a political post, feel your blood pressure rise. Back to Facebook to scroll again. Rinse, repeat, rinse, repeat.

Or as Nir Eyal puts it, get an itch, scratch it, get another itch, scratch it again, over and over. And you're hooked, the ultimate goal of what he terms "behavioral design".

Being aware of how sites reel you in... and hook you 🎣
Photo by Daria Nepriakhina / Unsplash

Cut the line... at least for awhile

When I used Twitter and Facebook (I don't anymore), I didn't like how I worried about feedback. Twitter, especially, is heavily used by peers in my industry, and who doesn't want the respect of their peers? I feared missing an update. I loved and hated the ups and downs of positive and negative feedback, wondering if someone would validate what I shared, criticize it, or just tear into it.

If you feel the same way, you're not alone. In fact, I'd challenge you to pick a month (as a New Year's resolution?), post a message telling everyone you're doing a social media detox (who isn't doing a detox of some sort anyway after a dozen holiday parties), disable notifications, and sign out. See how you feel. For me, it was uncomfortable for awhile, like I was missing out. It passed.


Understand the (al)lure

I'm on a kick now to understand why these sites are so alluring to me. Why do I use them, why do I miss them, is it by design (yes) or merely chance (no way).

I mentioned Nir Eyal's book above. I won't do it justice here, so when you take your month off, I recommend reading it. He lays out a solid method for getting users hooked on your app, and you'll find yourself, as I did, thinking about the various apps you use and how they do exactly what Nir describes, getting you to invest your own time in it, providing variable rewards like a slot machine, etc. It's enlightening and annoying to see how easy it is to be manipulated. We only have so many keystrokes (and mouse scrolls!) left, and someone's profiting off them.

Being aware of how sites reel you in... and hook you 🎣
replace "scientists" with "programmers"

Even better, he goes into the ethics of whether you should do it at all, about carefully thinking of what exactly your app achieves and whether or not it's a good solution to the itch it scratches. For example, people feel the loneliness itch, and Facebook or Twitter scratch it with an endless feed and variable rewards for contributing - is that the right cure for what ails us? πŸ€”


Endless feed(back)

Speaking of endless feeds, there's a great article by Rob Marvin titled "The Endless Scroll". Part of it's about using tech so much you forget to eat, sleep and work, but there's some really insightful stuff in there for anyone to understand about human nature in general. What follows are some of the more enlightening quotes. I'd be shocked if most of us didn't identify with at least some of these.

From Dr. David Greenfield, a psychologist studying and treating tech addiction:

"We feel constantly overwhelmed, because we're hypervigilant in responding to a million channels of information and communication, all of which emanate out of a device that we hold in our hands, that's with us 24/7. It's become an accessory to our life in a way that we've never seen before; it's a conduit through which we function and experience our lives. That has never existed in the history of humankind."

And a quote attributed to research Dr. Natasha Dow SchΓΌll is doing into addiction, describes what she calls "ludic loops" (yea, weird name), which is the comfort (or is it a mild high?) you feel when engaged in a repetitive activity that gives you occasional rewards.

Ludic loops occur when you pick up a smartphone and start scrolling. You flick through Facebook or Twitter, read some posts, check your email or Slack, watch a few Instagram stories, send a Snap or two, reply to a text, and end up back on Twitter to see what you've missed.

Before you know it, 20 or 30 minutes has gone by; often longer. These experiences are designed to be as intuitive as possible; you can open and start using them without spending too much time figuring out how they work.

Another great quote, again from Dr Greenfield. Of course, I have no idea what's talking about... do you?

[W]e've become a "boredom-intolerant culture," using tech to fill every waking moment β€” sometimes at the expense of organic creativity or connecting with someone else in a room. When was the last time you took public transportation or sat in a waiting room without pulling out a smartphone?
Being aware of how sites reel you in... and hook you 🎣
Photo by Jens Johnsson / Unsplash

From Adam Alter, another psychologist studying these things. This is part of the reason the next book on my list is The Art of Screen Time.

"I think it's really important that kids are exposed to social situations in the real world, rather than just through a screen where there's this delayed feedback. It's about seeing your friend when you talk to them; seeing the reactions on their face," said Alter. "The concern is that putting people in front of screens during the years where they really need to interact with real people may never fully acquire those social skills.

Ultimately though, it's up to us to cut ties with certain tech, instead of taking to social media (oh, the irony) when a company creates something we crave. Fortunately, the process of discovering how to write an addictive app means we're simultaneously discovering how to protect ourselves from addictive apps, thanks to the work of people like Nir Eyal and others like him. Once we realize we're being duped, we tend to hit back hard.


Tools don't use themselves

One last thought, from Arianna Huffington (who wrote a popular focus app).

Technology is just a toolβ€”it's not inherently good or bad. It's about how we use it and what it does for our lives. So phones can be used to enhance our lives or consume them. And though it sounds paradoxical, there's actually more and more technology that helps us unplug from technology. That kind of human-centered technology is one of the next tech frontiers.

Her argument is the same for any tool - a hammer can be misused to hurt someone, or it can be used to build a home for someone without one. It's always been about humans helping or hurting one another... not the tool itself.

Being aware of how sites reel you in... and hook you 🎣
Photo by Hunter Haley / Unsplash

If you're looking for a cliffs-notes version, someone put together a nice summary, which I think I'll hang on to for reference:

A summary of the book "Hooked: How to build habit-forming products"

↧

3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

$
0
0
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

I started looking at Ansible last week, after finding some good intro articles by Erika Heidi. Here's the one I followed in the last post.

How to Use Ansible to Automate Initial Server Setup on Ubuntu | DigitalOcean
Ansible offers a simple architecture that doesn’t require special software to be installed on nodes. It also provides a robust set of features and built-in modules which facilitate writing automation scripts. This guide explains how to use Ansible to automate the steps contained in our Initial Serve…
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

If you followed how I set things up in my other post, then after the script creates the "sammy" user you still won't be able to login because you don't know the password. Just login to the remote host as "root", run sudo passwd sammy, and you're golden. Obviously, it'd be better if I automated that part too, but whatever.. this is for play.

Today I'm running through another of Erika's posts, which includes some sample playbooks to run. Plus I created a few of my own pointless playbooks.

Configuration Management 101: Writing Ansible Playbooks | DigitalOcean
This tutorial will walk you through the process of creating an automated server provisioning using Ansible, a configuration management tool that provides a complete automation framework and orchestration capabilities. We will focus on the language terminology, syntax and features necessary for creat…
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Create a file and change the modification date

I created my first playbook with only two tasks. It creates an empty file using the file module, then makes it look old by changing the modification stamp.

---
- hosts: all

  tasks:
    - name: Create an empty file because reasons
      file:
        path: ~/sample_file.txt
        state: touch

    - name: Change the modification time of the empty file
      file:
        path: ~/sample_file.txt
        modification_time: 199902042120.30

Run it with the -u flag to make it run as "sammy", and then verify that the file has been on your server for 20 years. :p

ansible-playbook my_first_playbook/playbook.yml -u sammy
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Install and remove packages

This task was taken from Erika's article, installing or updating 3 packages to the latest version. Then I added a task to remove git by setting it's state to absent. I can't believe how nicely Ansible abstracts away the underlying scripts it must be running to do what it does. πŸ‘

---
- hosts: all

  tasks:
    - name: Update some packages
      apt: name={{ item }} state=latest
      with_items:
        - vim
        - git
        - curl

    - name: Remove a package
      become: yes
      apt: name=git state=absent
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

Spin up a website using Apache

You'll want to copy the contents of Erika's ansible folder in the following repo.

erikaheidi/cfmgmt
Configuration Management Guide. Contribute to erikaheidi/cfmgmt development by creating an account on GitHub.
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

If you followed my setup using 2 DigitalOcean droplets, you'll need to add a task to allow port 80 (see below).

---
- hosts: all
  become: true
  vars:
    doc_root: /var/www/example
  tasks:
    - name: Update apt
      apt: update_cache=yes

    - name: Install Apache
      apt: name=apache2 state=latest

    - name: Create custom document root
      file: path={{ doc_root }} state=directory owner=www-data group=www-data

    - name: Set up HTML file
      copy: src=index.html dest={{ doc_root }}/index.html owner=www-data group=www-data mode=0644

    - name: Allow all access to tcp port 80
      ufw:
        rule: allow
        port: '80'
        proto: tcp

    - name: Set up Apache virtual host file
      template: src=vhost.tpl dest=/etc/apache2/sites-available/000-default.conf
      notify: restart apache
  handlers:
    - name: restart apache
      service: name=apache2 state=restarted

Here's the results. I colorized each area of output to make it easier to understand.

  • The red area shows that the only open port was for SSH, but the Ansible script configured it to allow port 80 as well (purple area).
  • The blue and green areas show the web page and apache config file, respectively.
  • The yellow area shows that apache2 has been up for nearly 7 minutes. It didn't restart Apache when I ran the script below, because I had run the playbook several times already and the apache conf file hadn't changed, so the 'setup apache virtual host file' task didn't have to run again... at least that's how I understand it.
3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

And finally, opening the little index.html page I created, which was copied to the remote host that the Ansible controller node copied it to. Success!

3 Ansible playbooks, 2 DO droplets, and a website... in a pear tree πŸŽ„

The power of Ansible is easy to see. So far, I've only played around with pushing changes to a single remote host, but I could easily spin up more droplets, modify the /etc/ansible/hosts file on the controller node (pasted below) to include them, and push out a website (or anything else I want) to every machine at once. 🀯

# This is the default ansible 'hosts' file.
#
# It should live in /etc/ansible/hosts
#
#   - Comments begin with the '#' character
#   - Blank lines are ignored
#   - Groups of hosts are delimited by [header] elements
#   - You can enter hostnames or ip addresses
#   - A hostname/ip can be a member of multiple groups

[servers]
server1 ansible_host=64.225.30.45

[servers:vars]
ansible_python_interpreter=/usr/bin/python3
↧
Viewing all 348 articles
Browse latest View live